RE: Nushu

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Jun 26 2001 - 06:58:30 EDT


Michael Everson wrote:
> 600 characters it is then.

If Nüshu is actually logographic, think that this may be too early a
conclusion.

If the figure is based on a single person's sample, it could only reflect
the vocabulary of that lady or, even worse, the vocabulary of the topics
that she was dealing with in the sample texts.

If it is not possible to do more investigation on Nüshu, I would suggest to
reserve a bigger area (>= 2000 entries), because that is the minimal number
that I would expect from a logographic script.

I have a book that has a nice chapter about the frequency of Chinese
character (*). From evaluations made on a modern corpus, they obtained this
statistical progression:

- the most common character covers the 4% of all texts,
- the 5 most common characters cover the 8% of all texts,
- the first 10 cover the 11%,
- the first 20 cover the 17%,
- the first 28 cover the 20%,
- the first 163 cover the 50%,
- the first 2400 cover the 99%,
- the first 4000 cover the 99.9%,
- 6359 practically covers 100% of all modern text.

But we know that Unicode, as well as any CJK character set, has much more
than 7000 characters.

If Nüshu has the same statistical distribution, the ~600 characters used by
Ms. Yang could be the tip of an iceberg.

(* the book is: Yin Binyong & John S. Rosenhow, "Modern Chinese Characters",
Sinolingua, Beijing, 1994, ISBN 7-80052-167-2, 0-8351-2474-6; chapter 2:
"The Number of Chinese Characters").

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:19 EDT