Re: [long] Use of Unicode in AbiWord

From: John Jenkins (jenkins@apple.com)
Date: Wed Mar 24 1999 - 07:47:22 EST


Markus says:

> out of curiosity, is there a number for han ideographs in "general use"?
> unicode 3.0 has somewhere around 27000, and i don't know about numbers for
> what the irg is preparing.
>

It takes knowledge of about 2000-3000 ideographs to read a newspaper.

*Very* knowledgeable people will be able to use and recognize about
8,000-10,000. Note, however, that it isn't the same 8,000 to 10,000 in
every case.

The KangXi dictionary has about 50,000, half of which are currently encoded.

Estimates for the total number of ideographs used either now or over the
course of Chinese literary history range from 80,000 to 100,000.

Analogously, an advanced English-speaker will have a vocabulary of about
20,000 to 30,000 words. An unabridged English dictionary will have about
600,000.

The IRG's current goal is to make sure that everything in the KangXi and
Hanyu Da Zidian (a major modern Chinese dictionary) is encoded. Granted, a
lot of those ideographs are known *only* because they are found in the
KangXi, but it's analogous to wanting to make sure that Unicode can cover
the full contents of the OED. The IRG is anticipating some 40,000
additional ideographs to be put in plane 2.

And, BTW, not only is Hong Kong producing lists of previously unencoded
Cantonese-specific ideographs, Japan's new JIS X 0213 standard includes
several hundred ideographs not yet in Unicode.

And there will be no new ideographs added to the BMP. WG2 has been firm on
that point. They'll all go on Plane 2.

=====
John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT