Re: *Complete* Big5 to Unicode mappings

From: John Cowan (cowan@mercury.ccil.org)
Date: Mon Apr 21 2003 - 10:02:31 EDT

  • Next message: Ostermueller, Erik: "RE: *Complete* Big5 to Unicode mappings"

    Elliotte Rusty Harold scripsit:

    > Is there anywhere I can find or piece together a *complete* list of
    > Unicode characters that are available in Big5 (and other similar
    > sets)? I've looked at unihan.txt, and it has part of what I need but
    > not all of it. It specifies which Unicode Han characters are
    > available in which other character sets. However, most of these
    > character sets include various ASCII characters, Greek letters,
    > symbols, digits, and so forth. These do not appear to be listed in
    > unihan.txt.

    http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT .
    Don't be afraid of the "OBSOLETE" in the URL, which just means that
    these are not officially maintained by Unicode any more. In general,
    http://www.unicode.org/Public/MAPPINGS is a good source of mappings.
    http://crl.nmsu.edu/~mleisher/csets.html is another high-quality set,
    not overlapping with the Unicode one. Both sites use the same simple
    tabular format.

    You will probably get complaints that your Big5 mapping is not complete.
    This is because there is no standard way to map Big5 extensions to
    Unicode -- indeed, the various vendors do not even agree on what the
    extensions are.

    I look forward to the excellent small and fast code you will design
    for representing the list of valid characters in Big5 and other large
    character sets....

    -- 
    Deshil Holles eamus.  Deshil Holles eamus.  Deshil Holles eamus.
    Send us, bright one, light one, Horhorn, quickening, and wombfruit. (3x)
    Hoopsa, boyaboy, hoopsa!  Hoopsa, boyaboy, hoopsa!  Hoopsa, boyaboy, hoopsa!
      -- Joyce, _Ulysses_, "Oxen of the Sun"       jcowan@reutershealth.com
    


    This archive was generated by hypermail 2.1.5 : Mon Apr 21 2003 - 10:38:17 EDT