From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 12 2003 - 09:53:03 EDT
I don't know if you got this message in the list, as I don't see it in the
archive. So this is a repost...
Sorry if you already got it...
----- Original Message -----
From: "Philippe Verdy" <verdy_p@wanadoo.fr>
To: <unicode@unicode.org>
Cc: "Mark Davis" <mark.davis@us.ibm.com>; "Martin Dürst" <duerst@w3.org>
Sent: Monday, May 12, 2003 2:29 AM
Subject: Unicode 4.0 normalization error (missing exclusion for "Tibetan
Vowel Sign Reversed II")
> After some tests I have seen that one character defined in the test file
is
> excluded from canonical recomposition:
>
> This normalization test chart:
> http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
> lists:
>
> 0F81; 0F71 0F80; 0F71 0F80; 0F71 0F80; 0F71 0F80 # (◌ཱྀ; ◌ཱ◌ྀ; ◌ཱ◌ྀ; ◌ཱ◌ྀ;
> ◌ཱ◌ྀ; ) TIBETAN VOWEL SIGN REVERSED II
>
> However I don't know why it is not listed in
> http://www.unicode.org/Public/4.0-Update/CompositionExclusions-4.0.0.txt
>
> It should list all the Tibetan decompositions (the others are already
> included in the normalization tests chart):
> 0F43; 0F42 0FB7; TIBETAN LETTER GHA
> 0F4D; 0F4C 0FB7; TIBETAN LETTER DDHA
> 0F52; 0F51 0FB7; TIBETAN LETTER DHA
> 0F57; 0F56 0FB7; TIBETAN LETTER BHA
> 0F5C; 0F5B 0FB7; TIBETAN LETTER DZHA
> 0F69; 0F40 0FB5; TIBETAN LETTER KSSA
> 0F73; 0F71 0F72; TIBETAN VOWEL SIGN II
> 0F75; 0F71 0F74; TIBETAN VOWEL SIGN UU
> 0F76; 0FB2 0F80; TIBETAN VOWEL SIGN VOCALIC R
> 0F78; 0FB3 0F80; TIBETAN VOWEL SIGN VOCALIC L
> 0F81; 0F71 0F80; TIBETAN VOWEL SIGN REVERSED II
> 0F93; 0F92 0FB7; TIBETAN SUBJOINED LETTER GHA
> 0F9D; 0F9C 0FB7; TIBETAN SUBJOINED LETTER DDHA
> 0FA2; 0FA1 0FB7; TIBETAN SUBJOINED LETTER DHA
> 0FA7; 0FA6 0FB7; TIBETAN SUBJOINED LETTER BHA
> 0FAC; 0FAB 0FB7; TIBETAN SUBJOINED LETTER DZHA
> 0FB9; 0F90 0FB5; TIBETAN SUBJOINED LETTER KSSA
>
> I think this is an incoherence, and CompositionExclusions-4.0.0.txt needs
to
> be corrected to include this character...
> I did not find a corrigendum for this case.
>
> As the UCD and the CompositionExclusions is normative and the composition
> tests chart is mostly informative, I think this will create bugs depending
> on which file is used to generate NFC/NFD conversion tables.
>
> But the standard also mandates testing the generated normalizer with this
> test file (in Normative Annex 9 this test is mandated but the test file is
> described "for convenience"... So we'll have an error for this Tibetan
> character when testing the normalizer according to the normative UCD and
> exclusions...
>
> This is the only character I found in all the new UCD 4.0.0 that exhibits
> this problem.
>
> This should be corrected while the new standard is in "prepublication"
> state, before the book is published. If it is already printed, this could
be
> done by publishing an online alert before the book is distributed, or by
> adding a corrigendum sheet in the printed book, because TR15 is extremely
> critical and now a full part of the standard as UAX#15...
>
> --Philippe.
>
This archive was generated by hypermail 2.1.5 : Mon May 12 2003 - 10:46:03 EDT