Comparing Raw Values of the Age Property
Philippe Verdy via Unicode
unicode at unicode.org
Tue May 23 08:27:51 CDT 2017
2017-05-23 8:43 GMT+02:00 Asmus Freytag via Unicode <unicode at unicode.org>:
> On 5/22/2017 3:49 PM, Richard Wordingham via Unicode wrote:
>> One of the objectives is to use a current version of the UCD to
>> determine, for example, which characters were in Version x.y. One
>> needs that for a regular expression such as [:Age=3.0:], which
>> also matches all characters that have survived since Version 1.1.
>> Another is to record for which versions of the standard a character had
>> some particular value of a property.
> I would tend to side with those who claim that "version number" is
> something that's defined by common industry practice, and therefore not
> something that Unicode needs to define - but is allowed to use. Just like
> Unicode doesn't define what an integer is, or hexadecimal number system or
> a whole host of other concepts that are used in defining in turn what
> Unicode is.
> As Markus implied, version numbers are a positional number system where
> the positions in turn are integers in decimal notation, separated by dots.
Not all version numbers obey this scheme with dots and only integers. There
are also version numbers using dates (separated by hyphens like in the ISO
format), or additional letters (a,b,c...) or labels (alpha, beta, RC)
sometimes in the middle of other fields (these labels are not always easy
to compare), but they are generally made to be case-insensitive and tend to
avoid non-latin letters, so Greek letters are named in Latin), and they
cannot be always parsed and combined as a single integer.
For comparing/sorting, it's best to use case-ensensitive and use only
primary differences in UCA. But the UCA algorithm should be tweaked using
preparsing to locate where there are numbers
In rare cases you may find roman decimal numbers (I, II,III, IV, V, IX, X)
which can't be strictly sorted like other Latin letters.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode