Re: PUA

From: Thomas Chan (tc31@cornell.edu)
Date: Tue Apr 15 2003 - 00:09:00 EDT

  • Next message: Peter_Constable@sil.org: "Re: Variant Glyph Display"

    On Tue, 22 Apr 2003, Christopher John Fynn wrote:

    > "Thomas Chan" <tc31@cornell.edu> wrote:
    > > Could you use PUA positions outside the BMP? The CJK gaiji->PUA mappings
    > > were all defined when only the BMP's PUA existed, so clashes should only
    > > occur there.
    >
    > Unfortunately currently there aren't many font tools that allow you to
    > easily map glyphs to PUA points outside the BMP - and it also may not
    > be easy for users of fonts to enter non BMP PUA characters.

    Yes, that is an jolting reminder of reality, just as there are characters
    in the privileged BMP that are rarer than some in Plane 2...

     
    > Anyone know if the PUA codeponts "taken up" by CJK characters like
    > this are the same ones in OS X and in MS Windows?

    I don't have an answer for your question, but adopting the theoretically
    ugly solution of avoiding PUA codepoints that are used to map UDCs in
    legacy encodings leaves one very cramped, e.g., CP950 (basically, Big5),
    with the largest UDC area (as far as I know), uses up the following
    ranges, according to an old (Win 98 era) document on Chinese EUDC[1]:
      FA40-FEFE -> U+E000-U+E310
      8E40-A0FE -> U+E311-U+EEB7
      8140-8DFE -> U+EEB8-U+F6B0
      C6A1-C8FE -> U+F6B1-U+F848

    The BMP's PUA ends at U+F8FF--that does not leave very much for anything
    other than alphabets and some syllabaries.

    UDC->PUA mappings from different locales conflict with each other, anyway,
    cf., the CP936 and CP950 mappings alone (both Chinese) in that same
    document, and users within a single CJK locale do have occasion to use
    multiple sets of UDCs (parallel to using sets of PUAs that overlap).

    [1] http://msdn.microsoft.com/library/default.asp?url=/library/en-us/w98ddk/hh/w98ddk/intl_8q2h.asp
    (There is similar information on related pages on Japanese and Korean
    EUDC.)

    I'd also like to add that the UDC areas in legacy CJK encodings do not
    have to be (and have not been) restricted to CJK or East Asian use
    (although the results are not always pretty, e.g., full-width monspaced
    Latin text); I recall there used to be an implementation of Ethiopic (now
    obsolete for a number of years) that was placed in the UDC regions of
    Shift-JIS.

    Thomas Chan
    tc31@cornell.edu



    This archive was generated by hypermail 2.1.5 : Tue Apr 15 2003 - 00:46:47 EDT