Aaron Cannon asked:
> Hi all, from the latest version of the standard, on line 16977 of the
> normalization tests, I am a bit confused by the NFC form. It appears
> incorrect to me. Here's the line, sans comment:
> 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE
> 0305 0300 0315 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305 0300
> 0315 0062;
> Just looking at column 2, which according to the comments at the top
> is the NFC form:
> 0061 05AE 0305 0300 0315 0062:
> This, however, does not appear to be in NFC form.
> The first character, and the second or third characters do not
> compose. However, the first and fourth (0061 and 0300) do, composing
> to 00E0.
> Since there are no further compositions, the normalized form should be
> 00E0 05AE 0305 0315 0062
> What am I missing?
Input is:
Code points: 0061 0305 0315 0300 05AE 0062
Ccc: 0 230 232 230 228 0
Output of canonical reordering is:
Code points: 0061 05AE 0305 0300 0315 0062
Ccc: 0 228 230 230 232 0
Next step is to start from 0061 and test each successive combining
mark, looking for composition candidates.
0061 does not compose with 05AE.
0061 does not compose with 0305.
0061 *could* compose with 0300 (00E0 = 0061 + 0300), *but*
0300 is *blocked* from 0061 by the intervening combining
mark 0305 with the *same* ccc value as 0300. So the
composition does not occur.
0061 does not compose with 0315.
The next character is 0062, ccc=0, a starter, so we are done.
For the relevant definitions, see:
and scroll down a couple pages to D115 on p. 139.
Test cases like this are included in NormalizationTest.txt precisely
to ensure that implementations are correctly detecting these
sequences where composition is blocked.
Unicode mailing list
Received on Thu Oct 23 2014 - 13:16:31 CDT
This archive was generated by hypermail 2.2.0 : Thu Oct 23 2014 - 13:16:31 CDT