I've been creating a new set of CNS11643-1992 <-> Unicode mapping tables
based on Unihan-3.1.1's kIRG_TSource tag, and have come across a few
glitches.
Firstly, there is a typo in Unihan-3.1.1.txt. Compatibility ideograph U+2F958
has its TSource listed as 6-4627, which clashes with U+28E84. U+2F958 is
clearly not correct - its TSource should be 6-4267.
Apart from that, it would appear that the CNS<->Unicode mapping specified
by Unihan is still not quite complete. It seems to me that a complete
round-trip mapping is the intent of Unicode 3.1, although I haven't seen it
explicitly stated.
I have 19 characters in planes 3-7 that don't show up in the kIRG_TSource
mapping:
Gap in plane 3 at 65/72 (6168)
Gap in plane 4 at 02/59 (225B)
Gap in plane 4 at 03/65 (2361)
Gap in plane 4 at 07/74 (276A)
Gap in plane 4 at 08/07 (2827)
Gap in plane 4 at 08/93 (287D)
Gap in plane 4 at 10/78 (2A6E)
Gap in plane 4 at 16/34 (3042)
Gap in plane 4 at 24/60 (385C)
Gap in plane 4 at 35/46 (434E)
Gap in plane 4 at 36/56 (4458)
Gap in plane 4 at 67/25 (6339)
Gap in plane 4 at 69/63 (655F)
Gap in plane 5 at 03/43 (234B)
Gap in plane 5 at 85/76 (756C)
Gap in plane 6 at 10/01 (2A21)
Gap in plane 6 at 60/15 (5C2F)
Gap in plane 7 at 12/26 (2C3A)
Gap in plane 7 at 33/57 (4159)
The other 48,008 ideographs are round-tripped, with the assistance of the
CJK Compatibility Ideographs Supplement. Can anyone enlighten me as to the
status of the missing 19?
I'm also not clear on the status of plane 15. Is it really part of CNS
11642-1992?
-- Kevin Bracey, Principal Software Engineer Pace Micro Technology plc Tel: +44 (0) 1223 518566 645 Newmarket Road Fax: +44 (0) 1223 518526 Cambridge, CB5 8PB, United Kingdom WWW: http://www.pace.co.uk/
This archive was generated by hypermail 2.1.2 : Thu Nov 15 2001 - 10:57:32 EST