Re: Normalization forms

From: John Cowan (jcowan@reutershealth.com)
Date: Mon May 13 2002 - 17:27:16 EDT


Lars Marius Garshol scripsit:

> - will string comparison methods based on NFC and NFD always give the
> same results?

By intention, yes.

> - is it correct that methods based on NFKC and NFKD will give
> different results from ones based on NFC/NFD?

Yes.

> - if NFC and NFD give the same results, why are both specified? Why
> would an implementation choose one over the other?

Originally, only NFD was given, as it is sufficient. However, text
converted from non-Unicode encodings is generally already in NFC,
so specifying NFC (which is conceptually NFD with a post-processing
pass to re-create certain precomposed characters) has certain practical
advantages. In particular, if you are doing "early normalization",
near the point of creation, then NFC allows easy step-down to
non-Unicode encodings.

> - NFKC/NFKD seem to lose significant information; in what contexts
> are they intended to be used?

Compatibility distinctions may or may not be important in particular
cases: often they represent distinctions that are merely historical.
One context where compatibility distinctions are typically unimportant
is in identifiers.

-- 
John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_



This archive was generated by hypermail 2.1.2 : Mon May 13 2002 - 18:05:36 EDT