From: "Shlomi Tal" <email@example.com>
> Another FAQ-like essay of mine.
> Request for corrections.
Ok, if you insist. :-)
> Microsoft Windows can handle text in at least one of three modes:
> 1. 8-bit stream with 256-character repertoire
> 2. 16-bit stream with 65536-character repertoire
> 3. 8-bit stream with 65536-character repertoire
#1 fails to take into account CJK "ANSI" code pages, which support a lot
more than 256 characters. Also, if you move beyond notepad into text editors
that allow saving into different encodings, there is even gb18030.
> 2. ANSI Mode
> The oldest mode for text files in Microsoft Windows, and the only
> option for the Windows 9x family, is ANSI mode, in which the system
> recognizes 256 characters. Half of these (the ASCII range, 00 to 7F)
> are constant, and the other half (80 to FF) change according to the
> particular language version of the system. ANSI modes enable the use
> of only two scripts: Basic Latin plus one more codeset. Other codesets
> cannot be used in ANSI mode without changing the codepage (which, as
> regards Windows 9x, means installing a different version of the
> operating system).
See above -- DBCS code pages cannot be denied...
> Windows XP abandons ANSI mode and uses Unicode mode instead (see
> next), but for compatibility with Windows 9x and other codepage-based
> environment it emulates the ANSI mode for one codepage at a time.
XP abandons? The abandonment started in NT 3.1, and continued with NT 3.5,
NT 3.51, NT 4.0, Windows 2000, Windows XP, and Windows .Net Server.
Now I know you had a prelim note, but you are missing more than half of the
You might want to consider using "NT" or "WinNT" for the shorthand rather
than XP/WinXP -- this is much more common usage. If you just say "XP" maybe
you mean Office XP? NT and 9x are clearly referring to Windows platforms,
> opens a command prompt in which text is piped in and out as UTF-16
> little-endian. Text in Unicode mode can contain any character, and can
> be converted to any 8-bit codepage (except for a few such as Hindi and
> Georgian which are Unicode only).
This part needs a little work. It is not really true that text can be
converted to *any* code page, since most characters outside of ASCII will be
converted to "?" in most code pages. Unicode only languages have no code
pages to convert to -- though note that there are the ISCII code pages which
can convert Indic languages to an 8-bit code page.
Trigeminal Software, Inc. -- http://www.trigeminal.com/
This archive was generated by hypermail 2.1.2 : Fri May 31 2002 - 09:00:26 EDT