not all compatibility characters are created equal (was Re: Transcriptions of "Unicode")

From: Peter_Constable@sil.org
Date: Thu Dec 07 2000 - 11:54:39 EST


On 12/06/2000 09:23:58 PM Kenneth Whistler wrote:

>James Kass said:
[snip]
>> Consider the "teeth" ideograph(s). (Radical number 211, in
>> some radical lists.) Because this is a radical, CJK encoders
>> can select the specific desired character:
>> U+2FD2 for Traditional Chinese
>> U+2EED for Japanese
>> U+2EEE for Simplified Chinese
>
>Uh oh! This is one of the dangers of these dang radicals.
>First of all, the radicals are *not* intended to be used as
>regular ideographic characters. That is why they all have the "So"
>property, rather than "Lo". So if you go around recommending their
>usage *instead* of the unified character for regular text, you
>can end up with some strange behavior.
>
>Note that the entire Kangxi radical set, U+2F00..U+2FD5, are
>duplicate symbols for the radicals that *are* encoded as unified
>characters in the main set. Effectively, they are all compatibility
>characters.

They are not just effectively compatibility characters. They are, in fact,
compatibility characters. And in this case, it sounds like most people
should *not* be using them. So, they're like the Arabic presentation forms,
in this regard - we'd be better off if in general people don't use them.

But what about all the other compatibility characters? When should or
shouldn't I consider using them? Should we avoid using U+00A0 NO-BREAK
SPACE because it's a compatibility character? U+0E33 THAI CHARACTER SARA AM
is a compatibility character, but I believe it is used quite regularly. How
do I know what recommendation I should give users as to whether or not to
encode texts using U+0EDC LAO HO NO and U+0EDD LAO HO MO? I've been
thinking about this in relation to phonetic transcriptions for linguists:
the IPA handbook suggests the use of some superscript characters, but these
are compatibility characters. Is that a good idea, or not?

One problem I see for implementers is not knowing which compatibility
characters are good to use in general and which are not. We have heard on
many occasions that every character has a story. The compatibility
characters desperately need to have their stories told so that people will
know what to do with them. Otherwise, we will inevitably end up with cases
where people are encouraged to use characters when it would be better that
they didn't.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT