From: karl williamson (public@khwilliamson.com)
Date: Mon Jul 26 2010 - 21:24:05 CDT
Asmus Freytag wrote:
> On 7/25/2010 6:05 PM, Martin J. Dürst wrote:
>>
>>
>> On 2010/07/26 4:37, Asmus Freytag wrote:
>>
>>> PPS: a very hypothetical tough case would be a script where letters
>>> serve both as letters and as decimal place-value digits, and with modern
>>> living practice.
>>
>> Well, there actually is such a script, namely Han. The digits (一、
>> 二、三、四、五、六、七、八、九、〇) are used both as letters and as
>> decimal place-value digits, and they are scattered widely, and of
>> course there are is a lot of modern living practice.
> Martin,
>
> you found the hidden clue and solved it, first prize :)
>
> They do not show up as gc=Nd, nor as numeric types Digit or Decimal.
>
> The situation is worse than you indicate, because the same characters
> are also used as elements in a system that doesn't use place-value, but
> uses special characters to show powers of 10.
>
> However, as I indicated in my original post, in situations like that,
> there are usually some changes in practice that took place. Much of the
> living modern practice in these countries involves ASCII digits. While
> the ideographic numbers are definitely still used in certain contexts,
> I've not seen them in input fields and would frankly doubt that they
> exist there. I would fully expect that they are supported as number
> format for output, at least in some implementations, and, of course,
> that input methods convert ASCII digits into them. In other words, I
> wonder whether automatic conversion goes only one-way for these numbers.
> I would suspect it, for the general case, but I don't actually know for
> sure.
>
> For someone in Karl's situation, it would be interesting to learn
> whether and to what extent he should bother supporting these numbers in
> his language extension.
I would think I wouldn't support these numbers, since we couldn't be
unambiguously sure of what was intended.
Another issue that I brought up a while back on this list is Tamil
numbers, where western practice seems to have infiltrated enough that
Unicode gave them Gc=Nd, but IIRC from the responses I got back then,
they can appear in older style with other characters meaning 10, 100,
1000. In implementing this, if any of the other characters were
encountered in parsing such a number, it would disqualify it.
>
> A./
>
This archive was generated by hypermail 2.1.5 : Mon Jul 26 2010 - 21:28:48 CDT