RE: Developing multilingual web sites

From: Chris Pratley (chrispr@MICROSOFT.com)
Date: Thu Mar 23 2000 - 00:14:19 EST


Hi Suzanne. I think have to disagree. I don't see anything "native" about
the various non-Unicode legacy/emulated/ANSI/whatever-you-want-to-call them
code pages that Win2000/NT happen to support for compatibility with older
software.

The whole concept of encodings, code pages, etc. is extremely hard for
non-specialists to understand. The only way to clear the confusion (or at
least not make it worse) is to use the proper terms and explain lucidly so
that eventually it can all come together. Otherwise, people thinking that
code pages are native to NT are going to get very confused when they dig a
little deeper and see the developer documentation and talk to developers,
who refer to Unicode as the native encoding.

Even with clear use of terms, I despair of ever getting a significant
fraction of users to understand, and my goal is to make it unnecessary to
understand. But while we are in this time of hybrid Unicode/non-Unicode
environments and tools, users will occasionally bump up against this
problem, and when that happens, using the right terms (at least not
misleading ones) is important. (Aside to the purists: in my other mail, I
purposely said that the emulated encoding is "usually called the ANSI code
page". That is a fact. It is usually called that, like it or not. I often
joke about that use of ANSI in presentations, but we're stuck with it - it
is one of the wrinkles of Windows documentation and vernacular.)

To clarify:
* For Win2000/NT, the "native/internal/actual" encoding is UCS-2/UTF-16. It
is always Unicode regardless of the system language.
* For Win3.1/9x/Me, the "native/internal/actual" encoding is some code page,
never Unicode. The code page is different depending on the language flavour
of the system.
* For compatibility with Win9x/Me and even Win16 applications,
Windows2000/NT will emulate a particular language flavour of legacy
Windows9x/3.x. Calls to the non-Unicode APIs in Windows2000/NT are
translated to Unicode, acted on, and the results translated back to the
emulated code page before being returned to the application.

Non-Unicode applications are never aware that Win2000/NT is actually using
Unicode, and they never call the Unicode APIs in 2000/NT. I don't see any
way that we can get away calling this emulated code page the "native"
encoding of Windows2000. Non-Unicode APIs are guaranteed to be slower on
Win2000/NT than the "native" (Unicode) APIs.

This underlying use of Unicode in Win2000/NT and not in Win9x is a
fundamental difference in the two systems, so I think it is highly
misleading to say that the native encoding of Win2000 is something other
than Unicode. I understand that if you think of native in the sense of
"local", it has more meaning, but we should definitely avoid native in this
sense. You can call it emulated, or legacy, but most people just call it the
ANSI code page... :)

Regards,
Chris

-----Original Message-----
From: Suzanne Topping [mailto:stopping@rochester.rr.com]
Sent: Wednesday, March 22, 2000 5:34 AM
To: Unicode List
Subject: Re: Developing multilingual web sites

----- Original Message -----
From: Chris Pratley <chrispr@MICROSOFT.com>

> First, I want to clarify that the "native encoding" of Windows2000 is
> Unicode. What you are referring to as "native encoding" is actually the
> emulated encoding, usually called the ANSI code page of the system. In
Hong
> Kong, this should be "Big5" encoding.

I wonder if this might give Aaron the wrong idea...

In my exposure to encoding topics, discussions, articles, etc. "native"
encoding is the term that is commonly used as a descriptor for encodings
other than Unicode.

While Unicode forms the basis of Windows 2000, calling it "native encoding"
may lead to confusion.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT