RE: [long] Use of Unicode in AbiWord

From: Chris Pratley (chrispr@microsoft.com)
Date: Tue Mar 23 1999 - 20:46:36 EST


I believe the whole point of Unicode is that any characters that are
actually used (or were used) should be properly encoded in Unicode, and not
in the Private Use Area. Private Use is for implementation-specific
characters, not language-specific, no matter how rare. Besides, one person's
"rare" character is someone else's common one. If your name happened to use
one of those characters, you would no doubt consider it to be quite common.
I have personally had to deal with addressing correspondence to a person
whose name contained an ideograph not in the encoding system of the machine
I was using (pre-Unicode), and it was incredibly frustrating. In the end I
had to substitute a similar looking character, which can be quite insulting
for the person whose name it is (fortunately I was not the first to have the
problem, and there was an "accepted" substitute that was tolerated but not
desirable)

Unicode 3.0 is not the end of the line for Unicode. Those extra 50 000 Han
characters will (and should) eventually get catalogued and added to Unicode,
probably all in plane 2. This should be done even if they are archaic, so
that ancient texts can be computerized accurately. Once they are encoded,
vendors can start to support them. The first step is to get them encoded,
which is what is being worked on now.

Chris Pratley,
Microsoft Word

-----Original Message-----
From: schererm@us.ibm.com [mailto:schererm@us.ibm.com]
Sent: March 23, 1999 3:30 PM
To: Unicode List
Subject: Re: [long] Use of Unicode in AbiWord

well, please excuse that i participated in "uncorroborated assertions made
by Unicode insiders and their buddies". indeed i don't know any more about
most languages including the east asian ones than what i have been reading
here and elsewhere.

it appears to me that an impressive number (>1000) of ideographs is used in
those languages that are used infrequently in names and similar places and
apparently invented for those purposes? in this case, they are probably
best covered in unicode in private use areas, possibly in planes 15/16 if
the bmp private use area is too small. if there is "evidence" for a
"private extension" or even a "private encoding system" of a government
agency, then these could be candidates for starting a registration list
(outside unicode.org) for those characters.

out of curiosity, is there a number for han ideographs in "general use"?
unicode 3.0 has somewhere around 27000, and i don't know about numbers for
what the irg is preparing.

sorry for spreading gossips :-) ,

markus

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
schererm@us.ibm.com
                        Unicode is here! --> http://www.unicode.org/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT