From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 19 2009 - 14:04:35 CST
Martin v. Löwis noted:
> Currently, UCD.html says about Simple_Titlecase_Mapping
>
> Note: The simple titlecase may be omitted in the data file if the
> titlecase is the same as the uppercase.
>
> I think this note disagrees with the current UnicodeData.txt.
>
> For example, UnicodeData has
>
> 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
> CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z
> HACEK;;01C4;01C6;
>
> So we have:
> - upper case: U+01C4
> - lower case: U+01C6
> - title case: omitted, hence the same as uppercase, hence U+01C4
That inference is incorrect. The Simple_Titlecase_Mapping of
U+01C5 is U+01C5.
Please note the convention for default values: (<code point>),
listed at the property itself. That means that if a value is
not present, the code point itself is taken as the value of
the property for that entry.
>
> I think this is surprising: U+01C5 is already a titlecase letter,
> so its simple titlecase should be U+01C5.
It is.
>
> To fix this, I think one would either have to
> a) change UCD.html, to adjust the Note to
> The simple titlecase is omitted in the data file if the titlecase is
> the same as the code point itself,
> or
There was a subtle change in the documentation for Simple_Uppercase_Mapping,
Simple_Lowercase_Mapping, and Simple_Titlecase_Mapping between
Unicode 5.0 and Unicode 5.1. The UCD.html documentation used
to say "may be omitted" in the note for all three properties.
The problem was that it is *always* omitted for the Simple_Uppercase_Mapping
and Simple_Lowercase_Mapping, but the same is not true of
Simple_Titlecase_Mapping, because of the existence of the
compatibility titlecase letters in the standard.
So for Simple_Uppercase_Mapping and Simple_Lowercase_Mapping,
the UCD.html for Unicode 5.1 was updated: "may be omitted" -->
"is omitted". The text in the note for Simple_Titlecase_Mapping
was left as it was.
> b) change UnicodeData.txt to explicitly list the titlecase mapping
> for titlecase characters as the character itself.
I don't think that would help, because the value is already
correct.
What might help would be updating the text of the note
in the Proposed Update for UAX #44 (which is superseding
UCD.html) in the future.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jan 19 2009 - 14:08:10 CST