From: karl williamson (public@khwilliamson.com)
Date: Mon Jul 27 2009 - 12:59:28 CDT
Eric Muller wrote:
> karl williamson wrote:
>> I'm trying to come up with an alias to propose to the UCT for the
>> misleadingly named Age property. People tend to think from the name
>> that Age=3.2 means that the code point dates to version 3.2, when in
>> fact it means it dates to at least 3.2.
>>
>
> I am not entirely what distinction you make, but Age=3.2 means that the
> character was present in version 3.2 and in no earlier version.
>
> Eric.
>
>
Apparently that is what Asmus and others think as well, and it certainly
is the data that comes in DerivedAge.txt, and if that were truly the
case, I wouldn't have any problem with the term "Age". But let me quote
from the header of that file:
# Caution: When using the Age *property*, all assigned code points
# in each version are included, not just the newly assigned code points.
# For more information, see http://www.unicode.org/reports/tr18/
And, if you look at tr18, it says:
"
Caution: The DerivedAge data file in the UCD provides the deltas between
versions, for compactness. However, when using the property all
characters included in that version are included. Thus \p{age=3.0}
includes the letter a, which was included in Unicode 1.0. To get
characters that are new in a particular version, subtract off the
previous version as described in 1.3 Subtraction and Intersection. For
example: [\p{age=3.1} -- \p{age=3.0}]
"
So either you guys are wrong, or the documentation is wrong in at least
two places. I have to assume that the documentation is right until
shown otherwise; and if it is correct, I think that proves my point. If
experienced people who work with Unicode all the time don't understand
what this property is, then something is wrong, and at a minimum a new
alias is needed to clarify things.
I also don't think that in these days of abundant cheap storage that the
Consortium should be worrying about compactness. I believe every
property that is exposed in the UCD should have a fully derived version
available, probably in the extracted directory. In 5.2 Beta, the only
properties and property values that the user has to derive (except for
defaults) are Age, gc=LC, gc=C, gc=L gc=M, gc=N, gc=P, gc=S, and gc=Z.
There should be files in the extracted directory that show the derived
values for all of them. There are bound to be mistakes made when
programmers re-derive them; and there is duplicated work as well. This
Age property is a case in point. I wonder how many implementations
there are out there that have it wrong.
Unicode has made mistakes in the past with the UCD (the 4 code points
that were Attached_Below_Left instead of Attached_Below in one of the
Version 3 releases, and the incomplete DerivedLineBreak.txt which was
missing H3 in 4.1 spring to my mind), but at least it is subjected to
public review, and I would hope that the discipline of having to get it
to work under XML would catch most errors. (I did, though, find some
omissions in the 5.2 Beta PropertyValueAliases.txt file.)
This archive was generated by hypermail 2.1.5 : Mon Jul 27 2009 - 13:02:20 CDT