Re: 'code unit' and 'code point' meaning check

From: Peter_Constable@sil.org
Date: Fri May 16 2003 - 11:19:05 EDT

  • Next message: Marco Cimarosti: "RE: Collator tool"

    Philippe Verdy wrote on 05/15/2003 12:56:44 PM:

    > Or in other words, Unicode defines *code points* only, not code
    > units (this is left to specific encodings used to serialize it,
    > including UTF-*, and "compressed" BOCU and CESU encodings, which can
    > all be computed algorithmically from Unicode code points).
    >
    > Note that some UTF-* encodings are now described by Unicode.org as
    > standards, but is technically an annex to the standard, and not
    > necessary to its definition.

    The Unicode Standard defines multiple things, including the following:

    - a coded character set (the domain of code points)

    - three encoding forms (each of which describes a relationship between code
    points and sequences of code units of a particular bit size)

    Both of these are integral and necessary parts of the standard. It would be
    possible to create a standard that included the former but not the latter,
    but the result would not be the Unicode Standard.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485



    This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 12:13:18 EDT