RE: Confused by the difference between Case Mapping Charts and SpecialCasing.txt (U+0130)

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Tue Nov 19 2002 - 11:45:46 EST

Next message: Theodore H. Smith: "ATSUI text length parameters"

Previous message: Andy White: "RE: Errors in the Indic FAQ"
In reply to: Teri Griopich: "Confused by the difference between Case Mapping Charts and SpecialCasing.txt (U+0130)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Not a stupid question at all.

The reason SpecialCasing.txt changes the case mapping
for dotted uppercase I is as follows:

Take any two strings that are *canonically equivalent*.
One in Normal Form C (maximally composed) and one in
Normal Form D (decomposed). Now map the two strings
to lowercase. You would still expect the respective
results to be canonically equivalent. For that to
hold, the precomposed dotted uppercase I must map
to lowercase as an i with a combining dot above.
That is because the decomposed version will not get
removed the combining dot above when lowercasing an "I"
The latter would have been a viable alternative,
but that is only exercised for Turkish and Azeri,
for which a dot above is also introduced (procomposed)
when uppercasing an "i". See towards the end of
SpecialCasing.txt.

/kent k

Teri Griopich wrote:

> There is a file named "SpecialCasing.txt" which can be found
> at the following URL:
> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt
>
> I quote the following two lines from the file SpecialCasing.
> txt:
> # Preserve canonical equivalence for I with dot. Turkic is
> handled below.
> 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH
> DOT ABOVE
>
> The file "SpecialCasing.txt" says the lowercase letter of U+
> 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is "0069 0307",
> unless the locale under consideration is Turkish or Azeri.
>
> However, the Case Mapping Charts (http://www.unicode.org/
> charts/case/) says U+0069 LATIN SMALL LETTER I is the
> lowercase letter of U+0130 LATIN CAPITAL LETTER I WITH DOT
> ABOVE.
>
> I am confused???

> Thanks in advance,
>
> Teri

Next message: Theodore H. Smith: "ATSUI text length parameters"
Previous message: Andy White: "RE: Errors in the Indic FAQ"
In reply to: Teri Griopich: "Confused by the difference between Case Mapping Charts and SpecialCasing.txt (U+0130)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 19 2002 - 12:40:32 EST