Re: U+0F81 - Unicode 4.0 normalization error (missing exclusion for "Tibetan Vowel Sign Reversed II")

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon May 12 2003 - 13:02:07 EDT

  • Next message: Markus Scherer: "Re: EBCDIC code pages"

    Philippe Verdy wrote:
    >>After some tests I have seen that one character defined in the test file is
    >>excluded from canonical recomposition:
    >>
    >>This normalization test chart:
    >>http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
    >>lists:
    >>
    >>0F81; 0F71 0F80; 0F71 0F80; 0F71 0F80; 0F71 0F80 # (◌ཱྀ; ◌ཱ◌ྀ; ◌ཱ◌ྀ; ◌ཱ◌ྀ;
    >>◌ཱ◌ྀ; ) TIBETAN VOWEL SIGN REVERSED II
    >>
    >>However I don't know why it is not listed in
    >>http://www.unicode.org/Public/4.0-Update/CompositionExclusions-4.0.0.txt

    This is because CompositionExclusions.txt does not list all exclusions, but only those that are not
    algorithmically determinable. The Full_Composition_Exclusion property lists them all including
    U+0F81, in DerivedNormalizationProps.txt. See UAX #15 as well as UCD.html and the headers of the
    property files.

    Best regards,
    markus

    PS: The book is not published yet, but the Unicode 4 data files are final for about a month now.



    This archive was generated by hypermail 2.1.5 : Mon May 12 2003 - 13:45:16 EDT