Re: Error in definition of "compatibility character"?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Fri Oct 26 2001 - 19:55:02 EDT


-----BEGIN PGP SIGNED MESSAGE-----

Kenneth Whistler wrote:
> David Hopwood said:
>
> > I think the correct definition of a compatibility character is a
> > character with a compatibility decomposition that differs from its
> > canonical decomposition (i.e. NFKC(c) != NFC(c)). Am I right?
>
> Actually, what you mean here is NFKD(c) != NFD(c), ...

Yes, that is what I meant to say, although if I'm not mistaken these
definitions are equivalent:
NFKD(c) != NFD(c) <=> NFKC(c) != NFC(c).

> which is
> implicitly what Mark Davis was agreeing with. There is no reason
> to get the *re*composition (and composition exclusions) of
> NFKC and NFC mixed into the pot, too.

Agreed.

> First of all, as Mark pointed out, there are two quite distinct
> usages of the term in the standard currently.
>
> 1. (decomposition) compatibility character
>
> That is what D21 is about, and is derived on the basis of
> the presence or absence of compatibility decompositions.
>
> 2. (legacy) compatibility character
>
> These are characters that were included in the standard for
> compatibility with other standards, for crossmapping, or
> for other legacy interoperability reasons. Sometimes they
> have compatibility mappings, sometimes they have canonical
> mappings (see, e.g., all the CJK compatibility ideographs),
> and sometimes they have no mappings to other Unicode characters.
>
> The text of the standard is being rewritten to make the distinction
> between these two uses of the term clear.

Is there any formal definition of a legacy compatibility character
in terms of the Unicode data files, or is it only possible to give a
list? (If the latter, perhaps it would be useful to add a "Legacy"
property to PropList-n.n.n.txt.)

> In my opinion, rather than just "fixing" the D1 definition
> of "compatibility character" to match one or the other
> of these, we need a further clarification of the distinctions,
> and if necessary new terminology to make it easier to know
> which of these sets we are talking about.

I'd suggest keeping "compatibility character" for NFKD(c) != NFD(c),
and call the other definition just "legacy character". After all,
legacy characters don't have any formal relation to compatibility
equivalence.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO9n3fTkCAxeYt5gVAQFQ7Af9EPc4spMIlt+4H4jX1Py0sa0n/iEzh4uE
LYF6iSE7MyCUomV/+tu5L6e+qapr7alNLgy0xR17jeWyJMhWPq3MhdZpBZ+DYx9C
WgbRH1bN9pofZyHrHqE+CXm2mQKnSDZjEffhtdxAE9mqH8Nsqmec7j2K9J3+zv5k
w4r9qPMW2fB9mkbmdi3AxALRkBDpb5VNi5ff7+Ix5e7rj4vspDMZKIM6SzlZ5p3w
KeNrl7u++Up4rb+RuDJ5+Nil6aSQuonUyOehjxoZqe6JDFxf2eK1QxNeed4WjoFt
csnS296qhkG0/pnZi+fD9/K/N8TJBdLuXZ4KrDdAZTM6LYyLG0+Saw==
=rPxW
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Fri Oct 26 2001 - 21:20:13 EDT