Re: mandarin

From: John H. Jenkins (jenkins@apple.com)
Date: Wed Sep 12 2001 - 11:28:56 EDT


At 4:11 PM +0530 9/12/01, Shankaranarayana, Chandramouli (Cognizant) wrote:
>Can anybody tell me if the encoding for mandarin (chinese) is taken care of
>within 16 bits (0-65535) or does the encoding for mandarin require surrogate
>pairs ?
>

Chinese doesn't work this way. For that matter, English doesn't work
that way. For all the dialects of Chinese, the repertoire of
potential characters which may be used is essentially open-ended. It
is impossible to authoritatively list the set of characters which can
be used in the writing of any dialect.

The Han ideographic characters encoded in the BMP (some 27,000 in
number) are more than enough to handle the vast majority of every-day
use for modern Mandarin.

However, many of the relatively rare ideographs found in Plane 2
could conceivably be used in a modern text. This includes some of
the rarer characters from dictionaries such as the Hanyu Da Zidian,
or some of the more obscure characters derived form CNS 11643.

It is certainly true that non-Mandarin Chinese dialects cannot be
adequately covered even for most every-day use by using only BMP
characters. But even in the case of Mandarin, one must be prepared
for instances where non-BMP characters may be necessary.

-- 
=====
John H. Jenkins
jenkins@apple.com
jenkins@mac.com
http://homepage.mac.com/jenkins/



This archive was generated by hypermail 2.1.2 : Wed Sep 12 2001 - 11:14:10 EDT