Hello,
apologies if these questions are too trivial for this list. I've
looked at section 4.6 (The Unicode Standard, Version 2.0). I guess
I'm just trying to get a clearer notion of what numbers and digits are.
The names of some abstract characters contain the word "DIGIT" even if
they are not classified as digits (Nd).
I'm writing an isDigit() function for a C++ string class and I'm
preprocessing the UnicodeData.txt file to provide the information.
The version I'm using is UNICODE 2.1 CHARACTER DATABASE (update 2.1.9)
I need to determine if a code corresponds to a digit.
The simplest thing for me to do is just check the whether the General
Category is Nd (Number, Decimal Digit).
What's concerning me is that the codes for the Chinese characters
(yi, er, san, si, wu, liu, qi, ba, jiu, ling...) are in a character
range and the General Category for this range is Lo (Letter, Other).
So my function would return false for these codes. I guess I can
sort of live with that even tho' to my mind they seem to be digits.
*** Are these Chinese characters not digits or possibly not numbers?
The circled ideographs, 3280 to 3289, are classified as numbers "No".
It seems that they are considered numbers unless they are decomposed.
*** Can characters belong to more than one general category?
*** Can characters change their General Category, upon composition,
even if they are homogenous to one category when decomposed?
I noticed that the Hangzhou style numerals (3021 to 3029) were
classified as Nl (Number, Letter) even tho' they seem to be decimal
(without the zero but I assume they use DIGIT ZERO). I don't want
to include Category "Nl" because that would erroneously return true
for roman numerals.
*** Is there a reason for this special treatment of Hangzhou numerals?
*** And, ummm... this is just curiosity
- what are the Tibetan half digits used for?
Regards,
Viranga (viranga@mds.rmit.edu.au)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT