Re: digits and numbers

From: Rick McGowan (rmcgowan@apple.com)
Date: Fri Jun 11 1999 - 13:11:28 EDT


viranga@mds.rmit.edu.au asked a number of things about digits and numbers...

> I'm writing an isDigit() function for a C++ string class and I'm
> preprocessing the UnicodeData.txt file to provide the information.
> The simplest thing for me to do is just check the whether the General
> Category is Nd (Number, Decimal Digit).

That is the correct thing to do. There are things which are "numeric" but
aren't useful as decimal digits for most purposes, such as programming
languages. If you're doing an isDigit function, probably best to stick with
things which are decimal digits only.

> What's concerning me is that the codes for the Chinese characters
> (yi, er, san, si, wu, liu, qi, ba, jiu, ling...) are in a character
> range and the General Category for this range is Lo (Letter, Other).
> So my function would return false for these codes.

Yes, you should return false for those. They would only be useful to a
process which was prepared to deal with the complexities of formatting
Chinese style numbers, in which case you would need to deal with the
characters for hundred, thousand, ten-thousand and so forth.

> The circled ideographs, 3280 to 3289, are classified as numbers "No".
> It seems that they are considered numbers unless they are decomposed.

I would not bother making these digits. They're just symbols.

> *** Can characters belong to more than one general category?

No.

> *** Can characters change their General Category, upon composition,
> even if they are homogenous to one category when decomposed?

Possibly. Do you have an example?

> I noticed that the Hangzhou style numerals (3021 to 3029) were
> classified as Nl (Number, Letter) even tho' they seem to be decimal
> (without the zero but I assume they use DIGIT ZERO). I don't want
> to include Category "Nl" because that would erroneously return true
> for roman numerals.

Again, I advise leaving those alone. Only use decimal digits in your
isDigit function.
        
> *** Is there a reason for this special treatment of Hangzhou numerals?

They aren't decimal digits.

> *** And, ummm... this is just curiosity
> - what are the Tibetan half digits used for?

Hmmm... The book contains an explanation of the Tibetan half digits, but
the version 3.0 book will contain a better explanation... According to my
understanding, these half-digits are used only as the last (right-most) digit
of a number, and they effectively cause "0.5" to be subtracted from the
number in which they appear. If you have the Tibetan digit 4 followed by the
Tibetan HALF-DIGIT 2, that is a representation of forty-one and one-half, or
41.5. It only makes sense to use the half digits as the last digit of a
number. I understand they're used in traditional contexts, such as astrology
and market prices. I don't recommend dealing with them in your isDigit
function.

        Rick



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT