Hello, in UAX #44 i read
Simple_Titlecase_Mapping ...
Note: If this field is null, then the Simple_Titlecase_Mapping
is the same as the Simple_Uppercase_Mapping for this character.
So a parser has to be aware of this, automatically falling back to
the uppercase mapping (index 12) when there is no explicit
titlecase mapping (index 14).
Given this the following surprised me:
?0[steffen_at_sherwood unicode]$ <UnicodeData.txt awk 'BEGIN{FS=";"}\
{if (length($15) && $15 = $13) print}' |wc -l
1051
?0[steffen_at_sherwood unicode]$ <UnicodeData.txt awk 'BEGIN{FS=";"}\
{if (length($15) && $15 != $13) print}' |wc -l
12
(I.e., 1051 times the redundant mapping is defined.)
$ <UnicodeData.txt >UnicodeData.txt.new \
awk 'BEGIN{FS=";"; OFS=";"}\
{if (length($15) && $15 = $13) $15=""; print}'
--steffen
Received on Fri Aug 02 2013 - 10:22:46 CDT
This archive was generated by hypermail 2.2.0 : Fri Aug 02 2013 - 10:22:52 CDT