Re: Wanted: synonyms for Age

From: karl williamson (public@khwilliamson.com)
Date: Mon Jul 27 2009 - 12:59:28 CDT

  • Next message: Eric Muller: "Re: Wanted: synonyms for Age"

    Eric Muller wrote:
    > karl williamson wrote:
    >> I'm trying to come up with an alias to propose to the UCT for the
    >> misleadingly named Age property. People tend to think from the name
    >> that Age=3.2 means that the code point dates to version 3.2, when in
    >> fact it means it dates to at least 3.2.
    >>
    >
    > I am not entirely what distinction you make, but Age=3.2 means that the
    > character was present in version 3.2 and in no earlier version.
    >
    > Eric.
    >
    >

    Apparently that is what Asmus and others think as well, and it certainly
    is the data that comes in DerivedAge.txt, and if that were truly the
    case, I wouldn't have any problem with the term "Age". But let me quote
    from the header of that file:
    # Caution: When using the Age *property*, all assigned code points
    # in each version are included, not just the newly assigned code points.
    # For more information, see http://www.unicode.org/reports/tr18/

    And, if you look at tr18, it says:

    "
    Caution: The DerivedAge data file in the UCD provides the deltas between
    versions, for compactness. However, when using the property all
    characters included in that version are included. Thus \p{age=3.0}
    includes the letter a, which was included in Unicode 1.0. To get
    characters that are new in a particular version, subtract off the
    previous version as described in 1.3 Subtraction and Intersection. For
    example: [\p{age=3.1} -- \p{age=3.0}]
    "

    So either you guys are wrong, or the documentation is wrong in at least
    two places. I have to assume that the documentation is right until
    shown otherwise; and if it is correct, I think that proves my point. If
    experienced people who work with Unicode all the time don't understand
    what this property is, then something is wrong, and at a minimum a new
    alias is needed to clarify things.

    I also don't think that in these days of abundant cheap storage that the
    Consortium should be worrying about compactness. I believe every
    property that is exposed in the UCD should have a fully derived version
    available, probably in the extracted directory. In 5.2 Beta, the only
    properties and property values that the user has to derive (except for
    defaults) are Age, gc=LC, gc=C, gc=L gc=M, gc=N, gc=P, gc=S, and gc=Z.
    There should be files in the extracted directory that show the derived
    values for all of them. There are bound to be mistakes made when
    programmers re-derive them; and there is duplicated work as well. This
    Age property is a case in point. I wonder how many implementations
    there are out there that have it wrong.

    Unicode has made mistakes in the past with the UCD (the 4 code points
    that were Attached_Below_Left instead of Attached_Below in one of the
    Version 3 releases, and the incomplete DerivedLineBreak.txt which was
    missing H3 in 4.1 spring to my mind), but at least it is subjected to
    public review, and I would hope that the discipline of having to get it
    to work under XML would catch most errors. (I did, though, find some
    omissions in the 5.2 Beta PropertyValueAliases.txt file.)



    This archive was generated by hypermail 2.1.5 : Mon Jul 27 2009 - 13:02:20 CDT