Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Mon, 11 Feb 2013 10:26:25 +0100

The draft update to LDML for collation is at
http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-collation.html.

Bugs or requests can be filed at http://unicode.org/cldr/trac/newticket .

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Mon, Feb 11, 2013 at 9:35 AM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> On Mon, 11 Feb 2013 02:45:27 +0100
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
> > 2013/2/10 Richard Wordingham <richard.wordingham_at_ntlworld.com>:
>
> > The term "pathological" could aplpy to these cases where a "naive"
> > implementation may in fact break the expectations. How then can a
> > collator become a "conforming" process if it has to differentiate
> > canonically equivalent input strings ?
>
> There is a UCA collation option, 'normalization' set to 'off', which
> allows such incorrect operation if strings are not FCD. (Both NFC and
> NFD strings are FCD.) The UCA and LMDL definitions *still* together
> falsely claim that omitting normalisation will give the correct result
> on FCD strings; counter-examples include default collation <U+0F71
> TIBETAN VOWEL SIGN AA, U+0F73 TIBETAN VOWEL SIGN II> and Danish (still
> at CLDR Version 22.1) <U+0061 LATIN SMALL LETTER A, U+00E5 LATIN SMALL
> LETTER A WITH RING ABOVE>.
>
> Richard.
>
>
Received on Mon Feb 11 2013 - 03:30:55 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 11 2013 - 03:30:56 CST