Re: PRI#86 Update

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 09 2006 - 17:35:00 CDT

  • Next message: Rick McGowan: "New Unicode Technical Notes"

    Philippe Verdy wrote on Tuesday, May 09, 2006 3:48 PM

    > A conforming application should then be free to reject texts containing
    > codepoints that they still don't support in their builtin version of the
    > UCD. If an application tolerates those texts, then they should not assume
    > the stability of normalized forms, and so should better not apply any
    > normalization, to keep the texts intact (this is a conforming behavior, as
    > normalization of texts is not mandatory in conforming applications).

    Please give an example of how normalising text with an undefined character
    can corrupt the text.

    > This impacts other Unicode algorithms, such as collation (the sort order
    > of texts containing unallocated codepoints is NOT defined and NOT stable
    > as long as those codepoints are not officially standardized),

    I read the sort order of unallocated codepoints as being defined by
    http://www.unicode.org/reports/tr10/#Derived_Collation_Elements .

    Collation is no more stable than the weights used. You have to specify
    which version of the Default Unicode Collation Element Table (DUCET) you
    used. I am not aware of anything that prohibits the weights being changed
    as more is learnt about the collation order(s) for a script. I for one
    would be surprised if Version 7.0.0 of the DUCET (or its equivalent) did not
    sort Tamil words in the Tamil order. What is unstable is the application of
    an old tailoring if undefined codepoints acquire decompositions or non-zero
    combining class.

    Richard.



    This archive was generated by hypermail 2.1.5 : Tue May 09 2006 - 17:39:54 CDT