Just an observation

From: Steffen <sdaoden_at_gmail.com>
Date: Fri, 02 Aug 2013 17:16:21 +0200

Hello, in UAX #44 i read

  Simple_Titlecase_Mapping ...
    Note: If this field is null, then the Simple_Titlecase_Mapping
    is the same as the Simple_Uppercase_Mapping for this character.

So a parser has to be aware of this, automatically falling back to
the uppercase mapping (index 12) when there is no explicit
titlecase mapping (index 14).

Given this the following surprised me:

  ?0[steffen_at_sherwood unicode]$ <UnicodeData.txt awk 'BEGIN{FS=";"}\
    {if (length($15) && $15 = $13) print}' |wc -l
      1051
  ?0[steffen_at_sherwood unicode]$ <UnicodeData.txt awk 'BEGIN{FS=";"}\
    {if (length($15) && $15 != $13) print}' |wc -l
        12

(I.e., 1051 times the redundant mapping is defined.)

  $ <UnicodeData.txt >UnicodeData.txt.new \
    awk 'BEGIN{FS=";"; OFS=";"}\
    {if (length($15) && $15 = $13) $15=""; print}'

--steffen
Received on Fri Aug 02 2013 - 10:22:46 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 02 2013 - 10:22:52 CDT