From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jul 26 2010 - 13:06:11 CDT
On 7/26/2010 6:55 AM, John Burger wrote:
> Mark Davis ☕ wrote:
>
>> From just a quick scan, it appears that they are currently all
>> contiguous within their respective groups. If we were to impose a
>> stability policy, it would be a constraint on the general_category:
>> we would not assign general_category=decimal_number to any character
>> unless it was part of a contiguous range of 10 such characters with
>> ascending values from 0..9.
While that is true for the properties, it's not true for the encoding of
character that are *used* as decimal digits. Martin gave the most widely
used counterexample.
>
>
> Whether such a policy makes sense, I'm not clear on why it would be
> called a "stability" policy - the analogy to the existing such
> policies seems strained at best.
>
There are two parts to this.
One, and I think this is the more important part, is to have an encoding
policy of not splitting up runs of decimal digits - which would include
reserving a spot for a zero, in case, *over the lifetime of Unicode*,
some script changes their use from numbers 1-9 to decimal digits.
The other is a guarantee of what it means for a character to have the
decimal digit property.
My suggestion for handling this, differ a bit from what has been
discussed so far.
The first I would address by suitable language in the WG2 Principles and
Procedures document. This is where policies on encoding are maintained.
True, these policies do allow exceptions, but exceptions (note Han !) do
exist, and if a similar case of mixed-use character came along, then
they would have to be dealt with accordingly. What the P&P would do is
remove the wrong notion that it is OK to scatter runs of known decimal
digits when encoding new scripts.
The second I would address not by a stability policy, but by clarity of
definition of the property. Language such as:
"A character is given the decimal digit property, if and only if, it is
used in a decimal place-value notation and all 10 digits are encoded
in a single unbroken run starting with the digit of value 0, in
ascending
order of magnitude".
or equivalent would be quite sufficient. That language happens to be a
much clearer statement of the *implicit* definition used in assigning
this property than the language found in UAX#44 or Unicode Section 4.6.
Having that language where the property is documented is much more
useful and visible than in a stability policy.
A./
This archive was generated by hypermail 2.1.5 : Mon Jul 26 2010 - 13:07:58 CDT