Re: UCD.html and simple titlecase

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 19 2009 - 14:04:35 CST

  • Next message: Rick McGowan: "DTD downloading from Unicode.org"

    Martin v. Löwis noted:

    > Currently, UCD.html says about Simple_Titlecase_Mapping
    >
    > Note: The simple titlecase may be omitted in the data file if the
    > titlecase is the same as the uppercase.
    >
    > I think this note disagrees with the current UnicodeData.txt.
    >
    > For example, UnicodeData has
    >
    > 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
    > CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z
    > HACEK;;01C4;01C6;
    >
    > So we have:
    > - upper case: U+01C4
    > - lower case: U+01C6
    > - title case: omitted, hence the same as uppercase, hence U+01C4

    That inference is incorrect. The Simple_Titlecase_Mapping of
    U+01C5 is U+01C5.

    Please note the convention for default values: (<code point>),
    listed at the property itself. That means that if a value is
    not present, the code point itself is taken as the value of
    the property for that entry.

    >
    > I think this is surprising: U+01C5 is already a titlecase letter,
    > so its simple titlecase should be U+01C5.

    It is.

    >
    > To fix this, I think one would either have to
    > a) change UCD.html, to adjust the Note to
    > The simple titlecase is omitted in the data file if the titlecase is
    > the same as the code point itself,
    > or

    There was a subtle change in the documentation for Simple_Uppercase_Mapping,
    Simple_Lowercase_Mapping, and Simple_Titlecase_Mapping between
    Unicode 5.0 and Unicode 5.1. The UCD.html documentation used
    to say "may be omitted" in the note for all three properties.
    The problem was that it is *always* omitted for the Simple_Uppercase_Mapping
    and Simple_Lowercase_Mapping, but the same is not true of
    Simple_Titlecase_Mapping, because of the existence of the
    compatibility titlecase letters in the standard.
    So for Simple_Uppercase_Mapping and Simple_Lowercase_Mapping,
    the UCD.html for Unicode 5.1 was updated: "may be omitted" -->
    "is omitted". The text in the note for Simple_Titlecase_Mapping
    was left as it was.

    > b) change UnicodeData.txt to explicitly list the titlecase mapping
    > for titlecase characters as the character itself.

    I don't think that would help, because the value is already
    correct.

    What might help would be updating the text of the note
    in the Proposed Update for UAX #44 (which is superseding
    UCD.html) in the future.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Jan 19 2009 - 14:08:10 CST