From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 09 2006 - 17:35:00 CDT
Philippe Verdy wrote on Tuesday, May 09, 2006 3:48 PM
> A conforming application should then be free to reject texts containing
> codepoints that they still don't support in their builtin version of the
> UCD. If an application tolerates those texts, then they should not assume
> the stability of normalized forms, and so should better not apply any
> normalization, to keep the texts intact (this is a conforming behavior, as
> normalization of texts is not mandatory in conforming applications).
Please give an example of how normalising text with an undefined character
can corrupt the text.
> This impacts other Unicode algorithms, such as collation (the sort order
> of texts containing unallocated codepoints is NOT defined and NOT stable
> as long as those codepoints are not officially standardized),
I read the sort order of unallocated codepoints as being defined by
http://www.unicode.org/reports/tr10/#Derived_Collation_Elements .
Collation is no more stable than the weights used. You have to specify
which version of the Default Unicode Collation Element Table (DUCET) you
used. I am not aware of anything that prohibits the weights being changed
as more is learnt about the collation order(s) for a script. I for one
would be surprised if Version 7.0.0 of the DUCET (or its equivalent) did not
sort Tamil words in the Tamil order. What is unstable is the application of
an old tailoring if undefined codepoints acquire decompositions or non-zero
combining class.
Richard.
This archive was generated by hypermail 2.1.5 : Tue May 09 2006 - 17:39:54 CDT