From: Mark Davis ☕ (mark@macchiato.com)
Date: Fri Apr 16 2010 - 12:41:55 CDT
Yes, the characters with Nd are by design those that can be used with
"normal" big-endian positional decimal syntax. whereby a sequence of such
digits {N0, N1, N2, ...Nn} has the numeric value (...(N0 * 10 + N1) * 10 +
N2) * 10 ... + Nn)
Numeric characters that are peculiar in some fashion, and cannot simply be
interpreted in the above fashion, are marked as numbers, but not Nd. Here is
the list, grouped by General Category:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{n}&g=gc
and by General Category then Numeric Value:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{n}&g=gc+nv
Note that for security, implementations may want to put a further
restriction on sequences of Nd, so as to not mix scripts. Other measures may
also be needed, to detect problems like ৪à§, which looks like 89 (if you have
the font), but is actually 47 written in Bengali digits. For more info, see
UTS #36 (see proposed version at
http://www.unicode.org/reports/tr36/proposed.html).
If there are any discrepancies in the properties of the above characters,
those can be brought to the attention of the UTC using the reporting form
(the next meeting is in May).
Mark
— Il meglio è l’inimico del bene —
On Fri, Apr 16, 2010 at 09:14, karl williamson <public@khwilliamson.com>wrote:
> Thanks for your response.
>
>
> Shriramana Sharma wrote:
>
>> On 2010-Apr-12 22:39, karl williamson wrote:
>>
>>> Can anyone tell me: Are there other scripts where Gc=Nd characters can
>>> behave with other than the positional meanings of the digits 0-9? The
>>> only technical note that has "number" in the title is the one that
>>> Shriramana mentioned, so I'm assuming not.
>>>
>>
>> How about Telugu? IIRC the original proposal for the Telugu fractions
>> submitted by Nagarjuna Venna has examples for the Telugu digits being used
>> as modifiers for the fractions or something.
>>
>
> I looked this up, and found a paper by N. Venna, and it looks like what
> Unicode adopted was things like U+0C78: TELUGU FRACTION DIGIT ZERO FOR ODD
> POWERS OF FOUR. But their category is No, not Nd.
>
>
>
>> And for Devanagari? The above same for the "generic North Indic fractions"
>> proposed by Anshuman Pandey.
>>
>
> I looked this up as well, and these fractions, eg. U+A830: NORTH INDIC
> FRACTION ONE QUARTER also have general category No.
>
> Apparently there are no other cases of non-positional notation digits
> having general category=Nd.
>
>
>
>
This archive was generated by hypermail 2.1.5 : Fri Apr 16 2010 - 12:46:04 CDT