From: Martin v. Löwis (martin@v.loewis.de)
Date: Sat Jan 17 2009 - 12:21:15 CST
Currently, UCD.html says about Simple_Titlecase_Mapping
Note: The simple titlecase may be omitted in the data file if the
titlecase is the same as the uppercase.
I think this note disagrees with the current UnicodeData.txt.
For example, UnicodeData has
01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z
HACEK;;01C4;01C6;
So we have:
- upper case: U+01C4
- lower case: U+01C6
- title case: omitted, hence the same as uppercase, hence U+01C4
I think this is surprising: U+01C5 is already a titlecase letter,
so its simple titlecase should be U+01C5.
To fix this, I think one would either have to
a) change UCD.html, to adjust the Note to
The simple titlecase is omitted in the data file if the titlecase is
the same as the code point itself,
or
b) change UnicodeData.txt to explicitly list the titlecase mapping
for titlecase characters as the character itself.
What do you think?
Regards,
Martin
This archive was generated by hypermail 2.1.5 : Sat Jan 17 2009 - 12:39:40 CST