Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Mon, 11 Feb 2013 08:35:09 +0000

On Mon, 11 Feb 2013 02:45:27 +0100
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> 2013/2/10 Richard Wordingham <richard.wordingham_at_ntlworld.com>:

> The term "pathological" could aplpy to these cases where a "naive"
> implementation may in fact break the expectations. How then can a
> collator become a "conforming" process if it has to differentiate
> canonically equivalent input strings ?

There is a UCA collation option, 'normalization' set to 'off', which
allows such incorrect operation if strings are not FCD. (Both NFC and
NFD strings are FCD.) The UCA and LMDL definitions *still* together
falsely claim that omitting normalisation will give the correct result
on FCD strings; counter-examples include default collation <U+0F71
TIBETAN VOWEL SIGN AA, U+0F73 TIBETAN VOWEL SIGN II> and Danish (still
at CLDR Version 22.1) <U+0061 LATIN SMALL LETTER A, U+00E5 LATIN SMALL
LETTER A WITH RING ABOVE>.

Richard.
Received on Mon Feb 11 2013 - 02:42:32 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 11 2013 - 02:42:41 CST