RE: processing numeric strings

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Fri Mar 15 2002 - 13:02:14 EST


There are two quite separate issues here:

1) How can one format a numeric value as a string in the localised
representation?

2) How can one parse a string in the localised representation into a numeric
value?

I believe that the standard C/C++ library provides no support whatever for
either operation.

On MS Windows, there is support for the first operation. The
misleadingly-named GetNumberFormat takes a numeric string in Western format
and produces a numeric string in localised format. One of its parameters is
a locale ID. The documentation isn't clear on whether it will actually
produce output in the alternate number system for the locale.
Experimentation is needed. I've found that the related function,
GetDateFormat, when working with the Hebrew calendar, will produce year,
month and day numbers in the non-positional Hebrew number system, so it's
possible that GetNumberFormat will do the same.

For the second operation, Windows provides VarI4FromStr and VarR8FromStr. I
think these functions are more limited in what they will accept. I have
found that VarDateFromStr cannot parse a date in the Hebrew calendar - so it
seems likely that these other functions cannot handle numbers formatted in a
localised representation.

If you find a good way of formatting or parsing using localised
representation, I'd be very interested in hearing about it. Please CC me off
the list, if this is deemed OT.

Thanks

- rick cameron

-----Original Message-----
From: Peter_Constable@sil.org [mailto:Peter_Constable@sil.org]
Sent: Friday, 15 March 2002 9:18
To: unicode@unicode.org
Subject: processing numeric strings

I've got a question I asked about on a couple of other lists, but didn't
get much response, so I thought I'd try here.

One of our developers has asked me for input on a certain problem: "Do I
need to be able to work with numbers represented using digits/numbering
systems other than the European ("Arabic") decimal-based system, and if so
how to I know what to do with things like U+0BF1 TAMIL NUMBER ONE
HUNDRED?"

I could easily answer the first part: Yes. At least, I know that some
users of the software will want to work with Thai or Arabic ("Indic")
digits, and I'm pretty sure I know of users who will want to work with the
Ethiopic numbering system. Things like Thai aren't hard to deal with since
the numbering system works the same way as our decimal numbering system;
it's just that different characters are used for the digits. But things
like Ethiopic (or Tamil) are more involved since the numbering system
works on different principles.

How are people dealing with presenting or interpreting numeric strings
using systems such as Tamil, Ethiopic or even Thai, etc. digits, not all
of which use numbering systems that work the same as the decimal system
used in the west? (Note, this is *not* primarily about formatting issues
such as decimal or group separators, though those are obviously also
involved.)

I'm not a C programmer so I don't know -- do C or C++ libraries provide
functions for converting integer or other numeric data types into strings
that allow one to select what script / numbering system to use?

I'm pretty sure VB functions like Format$ or CLng only handle issues like
decimal separators but not this; have I missed something?

I've looked in MSDN Win32 documentation, and I see that GetLocaleInfo can
tell you the locale-specific equivalents for digits 0 to 9, but that
doesn't give you a way to present integers using them or to interpret a
string of these as a numeric value, and I haven't seen anything else in
Win32 that does that. Is there anything that perhaps I've just not found?
Does COM provide anything?

(I got one response from the other two lists I mentioned which did tell me
what I suspected: the .Net framework doesn't provide any support for this
kind of thing.)

If the answers to all the above are negative (i.e. this functionality is
neither built into compilers / libraries or into Windows), are there any
open source implementations that deal with this? Does ICU handle this?

If people are resorting to writing their own algorithms, where are you
getting information about how different numbering systems work?

Maybe my question should be to ask whether this *is* an issue for anyone
else. The one response I got from the two other lists was from Michael
Kaplan, who suggested that since software hasn't done this in the past,
people have used decimal digits 0-9, with the result that there isn't
really a current need for things like Tamil digits, and systems like
Arabic are trivial do deal with. Are others finding there isn't much need
for dealing with numbering systems that work differently from the Arabic
decimal numbers?

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Mar 15 2002 - 12:36:09 EST