From: Doug Ewell (dewell@adelphia.net)
Date: Fri May 16 2003 - 03:11:22 EDT
My day to pick on Philippe Verdy <verdy_p at wanadoo dot fr>:
>> In a nutshell: Unicode is not UTF-16.
>
> Or in other words, Unicode defines *code points* only, not code units
> (this is left to specific encodings used to serialize it, including
> UTF-*, and "compressed" BOCU and CESU encodings, which can all be
> computed algorithmically from Unicode code points).
Unicode defines the encoding forms, and thus the code units used by
those encoding forms. If Philippe simply means that the code units used
to represent a given code point vary depending on the chosen encoding
form, he is of course right.
Note that there is a bit of confusion here between encoding forms, which
are about code units, and encoding schemes, which are about bytes. (I
had a lot of trouble separating these two, at first.) Also, replace
"CESU" with "SCSU" in this passage.
> Note that some UTF-* encodings are now described by Unicode.org as
> standards, but is technically an annex to the standard, and not
> necessary to its definition.
As Michka pointed out, Unicode Standard Annexes *are* part of the
Unicode Standard. But this is moot, since all three UTF's are defined
directly in the standard itself, not in UAX's (although UTF-32 used to
be).
This has nothing to do with whether Unicode conformance requires
implementation of any particular UTF. (It does not.)
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 03:57:02 EDT