From: Peter_Constable@sil.org
Date: Fri May 16 2003 - 11:19:05 EDT
Philippe Verdy wrote on 05/15/2003 12:56:44 PM:
> Or in other words, Unicode defines *code points* only, not code
> units (this is left to specific encodings used to serialize it,
> including UTF-*, and "compressed" BOCU and CESU encodings, which can
> all be computed algorithmically from Unicode code points).
>
> Note that some UTF-* encodings are now described by Unicode.org as
> standards, but is technically an annex to the standard, and not
> necessary to its definition.
The Unicode Standard defines multiple things, including the following:
- a coded character set (the domain of code points)
- three encoding forms (each of which describes a relationship between code
points and sequences of code units of a particular bit size)
Both of these are integral and necessary parts of the standard. It would be
possible to create a standard that included the former but not the latter,
but the result would not be the Unicode Standard.
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 12:13:18 EDT