From: Kent Karlsson (kentk@md.chalmers.se)
Date: Tue Nov 19 2002 - 11:45:46 EST
Not a stupid question at all.
The reason SpecialCasing.txt changes the case mapping
for dotted uppercase I is as follows:
Take any two strings that are *canonically equivalent*.
One in Normal Form C (maximally composed) and one in
Normal Form D (decomposed). Now map the two strings
to lowercase. You would still expect the respective
results to be canonically equivalent. For that to
hold, the precomposed dotted uppercase I must map
to lowercase as an i with a combining dot above.
That is because the decomposed version will not get
removed the combining dot above when lowercasing an "I"
The latter would have been a viable alternative,
but that is only exercised for Turkish and Azeri,
for which a dot above is also introduced (procomposed)
when uppercasing an "i". See towards the end of
SpecialCasing.txt.
/kent k
Teri Griopich wrote:
> There is a file named "SpecialCasing.txt" which can be found
> at the following URL:
> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt
>
> I quote the following two lines from the file SpecialCasing.
> txt:
> # Preserve canonical equivalence for I with dot. Turkic is
> handled below.
> 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH
> DOT ABOVE
>
> The file "SpecialCasing.txt" says the lowercase letter of U+
> 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is "0069 0307",
> unless the locale under consideration is Turkish or Azeri.
>
> However, the Case Mapping Charts (http://www.unicode.org/
> charts/case/) says U+0069 LATIN SMALL LETTER I is the
> lowercase letter of U+0130 LATIN CAPITAL LETTER I WITH DOT
> ABOVE.
>
> I am confused???
> Thanks in advance,
>
> Teri
This archive was generated by hypermail 2.1.5 : Tue Nov 19 2002 - 12:40:32 EST