From: vunzndi@vfemail.net
Date: Thu Oct 25 2007 - 20:34:50 CDT
Dear Peter,
the exact set of plane 2 characters of course depends on the context  
one is talking about, however appliacations need to be able to suport  
planes 1-16. The most obvious set are the Cantonese characters found  
in plane 2. However various books and even newspapers often require  
characters in plane 2.
I aaware that the original aim of unicode was to have all 'useful'  
characters in the BMP. However as far as CJKV characters are concerned  
this has not been done, rather characters have been added on a first  
come first serve basis. If the allocation of CJKV codepoints continues  
to be donr in this way, then for modern CJKV coverage will require not  
only BMP and plane 1 support but also, in the future, plane 3 suport.
Plane 2 includes various Cantonese characters, and as yet unencoded  
include a large number of place names, any already submitted to the  
IRG should end up in plane 2, however any submitted in the future  
could well be in plane 3. Not to mention characters used by 'small'  
communities such as the Zhuang with a population over 10 million.
There are two slightly different questions here:-
    (1) What characters a font should include:-
If one in a font has a limited number of cjk glyphs that can be used,  
in this case one chooses the most useful characters (ttf files limit  
to 65536 glyphs). On even simple one has to decide what order to make  
cjk glyphs in. One example making useful characters first is  
uming.ttf, which includes quite a number of plane 2 characters, but  
not full Extension A support.
In pratice modern dictionaries, designed for high school/college level  
students tend to include about 20 to 25 thousand characters, however  
different regions use some different characters, so one could argue  
over 30 thousand chracters are required as a minimum.
     (2) What a features should an application support.
IMHO applications need to support surrogates in this day and age. For  
example, for one project I used perl Tk however I discovered too late  
perl Tk does not support surrogates. A difference in this case between  
being an application that is widely used and a dead end. I would there  
urge all developers to include surrogate support in the core features  
of their applications.
What other modern languages apart for cjkv require sopport beyond the BMP?
Yours sincerely
John Knightley
Quoting Peter Constable <petercon@microsoft.com>:
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]  
>  On Behalf Of vunzndi@vfemail.net
>
>> You certianly support for plane 2 characters, some really obsurce
>> Chinese characters are in the BMP, but some very useful ones are in
>> plane 2.
>
> I wonder if you could elaborate. We hear that CJK users typically   
> use well under 10K characters, and for years there have been   
> implementations using character sets that didn't include any of the   
> Plane 2 characters and that, evidently, were adequate for lots of   
> usage. So, it's not obvious that Plane 2 characters would be needed   
> in all application scenarios. (Of course, Tim hasn't really said   
> much about his application scenario.) I do note that the II Core set  
>  includes 22 Plane 2 characters; are these the characters you had in  
>  mind? In what scenarios is it important to support them?
>
>
>
> Peter
>
>
>
>
-------------------------------------------------
This message sent through Virus Free Email
http://www.vfemail.net
This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 20:38:14 CDT