Lars Marius Garshol scripsit:
> - will string comparison methods based on NFC and NFD always give the
> same results?
By intention, yes.
> - is it correct that methods based on NFKC and NFKD will give
> different results from ones based on NFC/NFD?
Yes.
> - if NFC and NFD give the same results, why are both specified? Why
> would an implementation choose one over the other?
Originally, only NFD was given, as it is sufficient. However, text
converted from non-Unicode encodings is generally already in NFC,
so specifying NFC (which is conceptually NFD with a post-processing
pass to re-create certain precomposed characters) has certain practical
advantages. In particular, if you are doing "early normalization",
near the point of creation, then NFC allows easy step-down to
non-Unicode encodings.
> - NFKC/NFKD seem to lose significant information; in what contexts
> are they intended to be used?
Compatibility distinctions may or may not be important in particular
cases: often they represent distinctions that are merely historical.
One context where compatibility distinctions are typically unimportant
is in identifiers.
-- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
This archive was generated by hypermail 2.1.2 : Mon May 13 2002 - 18:05:36 EDT