From: Andrew West (andrewcwest@gmail.com)
Date: Thu Oct 25 2007 - 12:13:49 CDT
On 25/10/2007, Peter Constable <petercon@microsoft.com> wrote:
>
> I wonder if you could elaborate. We hear that CJK users typically use well under 10K characters, and for years there have been implementations using character sets that didn't include any of the Plane 2 characters and that, evidently, were adequate for lots of usage. So, it's not obvious that Plane 2 characters would be needed in all application scenarios. (Of course, Tim hasn't really said much about his application scenario.) I do note that the II Core set includes 22 Plane 2 characters; are these the characters you had in mind? In what scenarios is it important to support them?
62 according to <http://www.cse.cuhk.edu.hk/~irg/irg/IICore/IICore.htm>
Offhand I don't really recognise any of them, but maybe they are
mostly used in the barbarian southern dialect.
On the other hand, there are a few characters not in the IICORE set
but which are commonly used in colloquial Mandarin that are in the
SIP, such as U+24B62 𤭢 cei4 "to break".
And CJK-C has 26 characters sourced to Xiandai Hanyu Cidian, the
standard PRC short dictionary of modern Chinese, including several
immediately recognisable characters such as the simplified form of
U+5D19 (the lun2 in kun1lun2 崑崙).
In my opinion it is no good trying to seek an easy way to support
Unicode without the hassle of combining characters, variation
selectors, contextual glyph variants, surrogate pairs, etc.. If for
some reason you cannot use existing implementations (and it is
difficult to imagine a scenario where you can't do so) then you have
to implement a proper generic solution that will work with everything.
Andrew
This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 12:17:04 CDT