From: Thomas Chan (tc31@cornell.edu)
Date: Tue Apr 15 2003 - 00:09:00 EDT
On Tue, 22 Apr 2003, Christopher John Fynn wrote:
> "Thomas Chan" <tc31@cornell.edu> wrote:
> > Could you use PUA positions outside the BMP? The CJK gaiji->PUA mappings
> > were all defined when only the BMP's PUA existed, so clashes should only
> > occur there.
>
> Unfortunately currently there aren't many font tools that allow you to
> easily map glyphs to PUA points outside the BMP - and it also may not
> be easy for users of fonts to enter non BMP PUA characters.
Yes, that is an jolting reminder of reality, just as there are characters
in the privileged BMP that are rarer than some in Plane 2...
> Anyone know if the PUA codeponts "taken up" by CJK characters like
> this are the same ones in OS X and in MS Windows?
I don't have an answer for your question, but adopting the theoretically
ugly solution of avoiding PUA codepoints that are used to map UDCs in
legacy encodings leaves one very cramped, e.g., CP950 (basically, Big5),
with the largest UDC area (as far as I know), uses up the following
ranges, according to an old (Win 98 era) document on Chinese EUDC[1]:
FA40-FEFE -> U+E000-U+E310
8E40-A0FE -> U+E311-U+EEB7
8140-8DFE -> U+EEB8-U+F6B0
C6A1-C8FE -> U+F6B1-U+F848
The BMP's PUA ends at U+F8FF--that does not leave very much for anything
other than alphabets and some syllabaries.
UDC->PUA mappings from different locales conflict with each other, anyway,
cf., the CP936 and CP950 mappings alone (both Chinese) in that same
document, and users within a single CJK locale do have occasion to use
multiple sets of UDCs (parallel to using sets of PUAs that overlap).
[1] http://msdn.microsoft.com/library/default.asp?url=/library/en-us/w98ddk/hh/w98ddk/intl_8q2h.asp
(There is similar information on related pages on Japanese and Korean
EUDC.)
I'd also like to add that the UDC areas in legacy CJK encodings do not
have to be (and have not been) restricted to CJK or East Asian use
(although the results are not always pretty, e.g., full-width monspaced
Latin text); I recall there used to be an implementation of Ethiopic (now
obsolete for a number of years) that was placed in the UDC regions of
Shift-JIS.
Thomas Chan
tc31@cornell.edu
This archive was generated by hypermail 2.1.5 : Tue Apr 15 2003 - 00:46:47 EDT