From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Apr 09 2008 - 13:34:52 CDT
Jim Allan said:
> According to Unicode specifications from Unicode version 1.0 up to the
> current version of Unicode, the character U+026A LATIN LETTER SMALL
> CAPITAL I (ɪ) capitalizes as U+0197 LATIN CAPITAL LETTER I WITH STROKE (Ɨ).
That is an incorrect statement of the Unicode specification.
"Capitalizes as" is defined by the Simple_Uppercase_Mapping field
(field #12) in UnicodeData.txt.
For Unicode 5.1, we have:
0197;LATIN CAPITAL LETTER I WITH STROKE;Lu;...;;0268;
0268;LATIN SMALL LETTER I WITH STROKE;Ll;...;0197;;0197
026A;LATIN LETTER SMALL CAPITAL I;Ll;...;;;
What that says is that U+0268 (*not* U+026A) has a Simple_Uppercase_Mapping
to U+0197.
And those case mapping values have remained unchanged since Unicode 2.0,
when they were first made available in machine-readable form in
UnicodeData.txt.
> See the official Unicode charts for the IPA Extension at
> http://www.unicode.org/charts/PDF/U0250.pdf .
>
> Under U+026A ɪ LATIN SMALL LETTER CAPITAL I the charts state:
> “→ 0197 Ɨ Latin capital letter i with stroke”.
A cross-reference annotation in the Unicode names list does not
define a case mapping, and never has. It may refer to a
character which is in a case mapping relation to the character
which is annotated, but that is only one of many possible meanings
of a cross-reference. Most commonly the cross-reference simply
means: "may appear confusingly similar to character XXXX". See
"Cross References" on p. 566 of TUS 5.0.
>
> Under U+0268 ɨ LATIN SMALL LETTER I WITH STROKE the charts state:
> “• ISO 6438 gives lowercase of 0197 Ɨ as 026A ɪ not 0268 ɨ”.
That was done to recognize the fact the ISO 6438 specifies a
different case mapping than the Unicode Standard does.
> But the Unicode case folding table at
> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt has long disagreed.
No, actually. It has long agreed, and is completely consistent
with the Simple_Uppercase_Mapping value for U+0268 (and the
Simple_Lowercase_Mapping for U+1097).
> To summarize, the position in the casefolding table is:
> U+026A ɪ LATIN SMALL LETTER CAPITAL I does not case
> U+0268 ɨ LATIN SMALL LETTER I WITH STROKE uppercases to U+0197 LATIN
> CAPITAL LETTER I WITH STROKE (Ɨ).
That is correct.
>
> The position in the Unicode printed material is:
> U+0268 ɨ LATIN SMALL LETTER I WITH STROKE does not appear in the table
> so therefore dos not case.
> U+026A ɪ LATIN SMALL LETTER CAPITAL I uppercases to U+0197 LATIN CAPITAL
> LETTER I WITH STROKE (Ɨ).
And that is a misinterpretation of the Unicode names list.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Apr 09 2008 - 13:37:35 CDT