From: John Burger (john@mitre.org)
Date: Tue Jul 27 2010 - 09:00:22 CDT
karl williamson wrote:
> Asmus Freytag wrote:
>> The situation is worse than you indicate, because the same characters
>> are also used as elements in a system that doesn't use place-value,
>> but
>> uses special characters to show powers of 10.
>>
>
> I would think I wouldn't support these numbers, since we couldn't be
> unambiguously sure of what was intended.
>
> Another issue that I brought up a while back on this list is Tamil
> numbers, where western practice seems to have infiltrated enough that
> Unicode gave them Gc=Nd, but IIRC from the responses I got back then,
> they can appear in older style with other characters meaning 10, 100,
> 1000. In implementing this, if any of the other characters were
> encountered in parsing such a number, it would disqualify it.
I think you could treat the Han digits the same way: In some of the
Chinese news corpora I work with, the ten Han digits are frequently
used Western-style, especially for years, phone numbers, and other
identifiers.
- John D. Burger
MITRE
This archive was generated by hypermail 2.1.5 : Tue Jul 27 2010 - 09:06:23 CDT