L2/02-044
Title: Mapping of Compatibility Ideographs
Authors: Ken Whistler & Martin Dürst
Date: 2002/01/25Martin said: > Dear Unicode Experts, > > I very much think the following should be considered very seriously > again, and most probably changed: > > In Unicode 3.2, there are 59 new compatibility ideographs at > U+FA30-FA6A. As far as I understand, they (or most of them) are > from the set of variants of the Japanese Ministry of Justice. > > All of them have a *canonical* mapping, which means that according > to the Unicode Standard, nobody can expect them to be preserved. > I propose that this be changed into a compatibility mapping. > This is in particular relevant for the Web, where the tendency > is to use NFC as much as possible. > > This would be much more in line with how similar differences > are handled in other scripts. > Aiy, yai yai! First of all, these is clearly not a *bug* in the BETA files, per se, where have the mappings as intended (and also as listed in the Source references for CJK Compatibility Ideographs in 10646). It is, however, a debatable position to take, that the UTC will need to consider and decide. My own take on this is that making such a change would introduce yet another, inconsistent class of CJK Compatibility characters into the standard. The CJK Compatibility characters at F900..FA2D all have canonical mappings (and that cannot be changed at this point) -- except for the 12 with are actually unified ideographs. For the majority of those, no one really cares -- the KS C 5601-1987 compatibility duplicates and the Big 5 duplicates are just duplicates, and don't carry true variant distinctions. However, among the remainder of the IBM 32, there are specific variants that fall within the kinds of variations ordinarily unified in the big list of unified ideographs, but which were pulled out here separately for roundtripping to IBM code pages. (And there are 500+ CNS compatibility characters already standardized for Unicode 3.1, all of which have canonical mappings, and many of which may have distinct variant implications for CNS as well.) Importantly, among the IBM 32 are some of the *same* systematic kinds of variations important to the 59 new compatibility ideographs at U+FA30-FA6A. Cf. the Unicode 3.0 characters FA18..FA1A with the new Unicode 3.2 characters FA4D..FA54. These show *exactly* the same variation in the radical form, and are maintained in the Japanese Ministry of Justice list for exactly the same traditionalist reasons. If we introduce an inconsistency between the way FA18..FA1A behave under normalization and FA4D..FA54, how are we going to explain why some are preserved and others are not? Incidentally, the 3 among the IBM 32 contain *the* most problematical of the bunch, FA19 "kami", which probably has more traditionalist associations in Japan that just about any other character! In short, I don't see how introducing an inconsistency in the way CJK Compatibility character mapping is done, just for this new set of 59 characters from JIS X 0213, will generically solve the problem of how to maintain *particular* glyph form distinctions on web pages using normalized Unicode data.