UCD 3.2.0

From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Fri Apr 05 2002 - 02:06:50 EST


Hi all,

I'd like to make a few remarks about the UCD files.

The following things I ran into when checking out the 3.2.0 release:

 o In PropertyValueAliases-3.2.0.txt line 79:
            ccc; 202; ATBL ; Attached_Below_Left
    whereas in UnicodeData-3.2.0.html I read:
            200: Below left attached
            202: Below attached
    What is is correct value for "attached below left", 200 or 202?

 o In SpecialCasing-3.2.0.txt lines 234 and 235 are missing the closing
    semicolon. This problem also appeared in 3.1.1.

 o Typo in UnicodeCharacterDatabase-3.2.0.html:
    "DerivedNormalizationProperties", should be "DerivedNormalizationProps".

Minor points that I find a bit annoying:

 o Many of the UCD files have a comment header with lines longer than 80
    characters. Viewing these files using the page utility on a 80 column
    terminal window to gives ugly output due to the forced line wrapping.

 o All UCD files except CaseFolding-3.2.0.txt and SpecialCasing-3.2.0.txt
    *separate* columns by semicolons. For the two exceptions the semicolon
    *terminates* a column, why not keep it the same for all UCD files?

 o UnicodeData-3.2.0.txt still uses this notation:
            1234;<Blah, First>;Lo;0;L;;;;;N;;;;;
            5678;<Blah, Last>;Lo;0;L;;;;;N;;;;;
    instead of
            1234..5678;<Blah, First>..<Blah, Last>;Lo;0;L;;;;;N;;;;;
    Since all other UCD files use the latter notation why not change this
    one too? IMHO backward compatibility with existing UCD file parsers
    shouldn't be an issue in this particular case.

Regards,
Theo



This archive was generated by hypermail 2.1.2 : Fri Apr 05 2002 - 03:04:09 EST