Aaron Cannon asked:
> Hi all, from the latest version of the standard, on line 16977 of the
> normalization tests, I am a bit confused by the NFC form. It appears
> incorrect to me. Here's the line, sans comment:
>
> 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE
> 0305 0300 0315 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305 0300
> 0315 0062;
>
> Just looking at column 2, which according to the comments at the top
> is the NFC form:
>
> 0061 05AE 0305 0300 0315 0062:
>
> This, however, does not appear to be in NFC form.
>
> The first character, and the second or third characters do not
> compose. However, the first and fourth (0061 and 0300) do, composing
> to 00E0.
>
> Since there are no further compositions, the normalized form should be
> 00E0 05AE 0305 0315 0062
>
> What am I missing?
>
Input is:
Code points: 0061 0305 0315 0300 05AE 0062
Ccc: 0 230 232 230 228 0
Output of canonical reordering is:
Code points: 0061 05AE 0305 0300 0315 0062
Ccc: 0 228 230 230 232 0
Next step is to start from 0061 and test each successive combining
mark, looking for composition candidates.
0061 does not compose with 05AE.
0061 does not compose with 0305.
0061 *could* compose with 0300 (00E0 = 0061 + 0300), *but*
0300 is *blocked* from 0061 by the intervening combining
mark 0305 with the *same* ccc value as 0300. So the
composition does not occur.
0061 does not compose with 0315.
The next character is 0062, ccc=0, a starter, so we are done.
For the relevant definitions, see:
http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G50628
and scroll down a couple pages to D115 on p. 139.
Test cases like this are included in NormalizationTest.txt precisely
to ensure that implementations are correctly detecting these
sequences where composition is blocked.
--Ken
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Oct 23 2014 - 13:16:31 CDT
This archive was generated by hypermail 2.2.0 : Thu Oct 23 2014 - 13:16:31 CDT