RE: Confused by the difference between Case Mapping Charts and SpecialCasing.txt (U+0130)

From: Kent Karlsson (
Date: Tue Nov 19 2002 - 11:45:46 EST

  • Next message: Theodore H. Smith: "ATSUI text length parameters"

    Not a stupid question at all.

    The reason SpecialCasing.txt changes the case mapping
    for dotted uppercase I is as follows:

    Take any two strings that are *canonically equivalent*.
    One in Normal Form C (maximally composed) and one in
    Normal Form D (decomposed). Now map the two strings
    to lowercase. You would still expect the respective
    results to be canonically equivalent. For that to
    hold, the precomposed dotted uppercase I must map
    to lowercase as an i with a combining dot above.
    That is because the decomposed version will not get
    removed the combining dot above when lowercasing an "I"
    The latter would have been a viable alternative,
    but that is only exercised for Turkish and Azeri,
    for which a dot above is also introduced (procomposed)
    when uppercasing an "i". See towards the end of

                    /kent k

    Teri Griopich wrote:

    > There is a file named "SpecialCasing.txt" which can be found
    > at the following URL:
    > I quote the following two lines from the file SpecialCasing.
    > txt:
    > # Preserve canonical equivalence for I with dot. Turkic is
    > handled below.
    > 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH
    > The file "SpecialCasing.txt" says the lowercase letter of U+
    > 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE is "0069 0307",
    > unless the locale under consideration is Turkish or Azeri.
    > However, the Case Mapping Charts (
    > charts/case/) says U+0069 LATIN SMALL LETTER I is the
    > lowercase letter of U+0130 LATIN CAPITAL LETTER I WITH DOT
    > ABOVE.
    > I am confused???

    > Thanks in advance,
    > Teri

    This archive was generated by hypermail 2.1.5 : Tue Nov 19 2002 - 12:40:32 EST