From: Addison Phillips (addison@yahoo-inc.com)
Date: Thu Oct 19 2006 - 18:05:32 CST
Hi Andrew,
Andrew Miller wrote:
> There appear to be a number of differences in the case mappings defined
> in UnicodeData.txt and SpecialCasing.txt
This is as it should be. Right at the top of the file it says:
# This file is a supplement to the UnicodeData file.
# It contains additional information about the casing of Unicode characters.
# (For compatibility, the UnicodeData.txt file only contains case
mappings for
# characters where they are 1-1, and does not have locale-specific
mappings.)
# For more information, see the discussion of Case Mappings in the
Unicode Standard.
In other words, this is where you will find every instance of case
mappings that consume a larger number of code points than the source text.
>
> For example, U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) has a
> lowercase mapping of U+0069 in UnicodeData.txt and a mapping of U+0069
> U+0307 in SpecialCasing.txt.
>
> All of the greek YPOGEGRAMMENI letters in SpecialCasing.txt have
> different uppercase mappings to those specified in UnicodeData.txt
>
> Can I just ignore the UnicodeData.txt mappings for these characters, and
> just use the ones defined in SpecialCasing ones instead?
>
Not entirely, you can't. The bottom part of the file contains
locale-specific mappings. These are mappings that should be used in
specific languages/locales and not elsewhere. For example:
# When uppercasing, i turns into a dotted capital I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
You wouldn't want the letter "i" to become İ (U+0130) under "normal"
(i.e. non-Turkish/non-Azerbaijani) circumstances.
Hope that helps.
Addison
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature.
This archive was generated by hypermail 2.1.5 : Thu Oct 19 2006 - 18:07:08 CST