RE: NT & UTF8

From: Addison Phillips (AddisonP@simultrans.com)
Date: Sat Oct 02 1999 - 17:49:17 EDT

Next message: Edward Cherlin: "Re: A basic question on encoding Latin characters)"
Previous message: Michael Everson: "Re: The politics of Unicode"
Maybe in reply to: Joshi Sandeep: "NT & UTF8"
Next in thread: Frank da Cruz: "RE: NT & UTF8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Frank wrote:

>Although the "DOS" window (it's not really DOS, it's a 32-bit console
>window, as you can tell by the fact that you can use long filenames, etc)
>can use Unicode fonts, it is restricted to "OEM code page" encoding, so
>you are limited to the repertoire of CP850 or whatever. It might be
>possible to change code pages, but it probably isn't easy. However, a
>console application that runs IN the console window does not have this
>limitation, and can take full advantage of whatever (fixed-pitch) font
>can be chosen in the console window's Font list.

This is mostly true. You can *easily* change the display character set (code
page) in a WinNT console to any installed code page by way of the "chcp"
command. That's right: ANY code page, so long as its .NLS file is installed
(use the \LANGPACK directory on your WinNT install CD to install the ones
you need).

chcp 10000 changes to Unicode (well, UCS-2) and you can display real Unicode
text in your "DOS" shell.

chcp 1252 changes the console to use the Windows "ANSI" code page -- the
same one used by Western European Windows programs and a close variant of
ISO-8859-1.

850 is, as Frank notes, the default for non-US Western European systems. 437
is the default for US-English systems.

You can install Asian character sets, such as CP932 (Shift-JIS) and they
will work properly, although the lack of an input method is a bit
constraining.

Your program, as Frank also points out, can be a full Win32 application,
including UNICODE to display on the device, plus any/all of the NLS APIs,
MFC, etc. You'll want to get the current OEM code page from the system to
convert the text before displaying it (using WideCharToMultiByte() if you're
running internally with Unicode), though.

UTF-8, however, is not a supported code page directly in the console,
however, and you'll have to "decode" it before display time arrives. Since
most of the Microsoft tools and APIs provide excellent UCS-2/UTF-16 support,
however, I would tend to recommend that you just use the OS's native
capabilities and skip UTF-8. You can always encode stuff as UTF-8 before
exchanging it with other systems (such as UNIX-like systems that have UTF-8
locales). Of course, this advice is implementation dependent and only you
know if it makes sense in your particular environment.

Hope this helps.

Addison
__________________________________________

        Addison Phillips
        Director, Globalization Engineering
        SimulTrans, L.L.C.
        2606 Bayshore Parkway
        Mountain View, California 94043 USA

        +1 650-526-4652 (direct telephone)
        +1 650-969-9959 (facsimile)
        AddisonP@simultrans.com (Internet email)
        http://www.simultrans.com (website)

"22 languages. One release date."
__________________________________________

Next message: Edward Cherlin: "Re: A basic question on encoding Latin characters)"
Previous message: Michael Everson: "Re: The politics of Unicode"
Maybe in reply to: Joshi Sandeep: "NT & UTF8"
Next in thread: Frank da Cruz: "RE: NT & UTF8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT