From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Jan 13 2006 - 15:11:20 CST
On Fri, 13 Jan 2006, Rick McGowan wrote:
> Kit Peters asked,
>
>> Can someone provide me a definitive list of all Unicode digits?
>
> You can make one yourself. Download the files from the latest UCD and look
> for "DIGIT". What you want, for starters, is probably the set of
> everything that has a value in the "decimal digit" field of the
> UnicodeData.txt file.
However, the more general concept of digit covers some other characters
too, such as superscript digits, which are counted as digits but may need
special treatment. See
http://www.unicode.org/Public/UNIDATA/UCD.html#Numeric_Type
Technically, you would consider the 8th field of each entry (line), and
if it is nonempty, the character is a digit. (The field is labeled "(7)"
in the UCD.html document, but that's because it does not count the first
field, the Unicode number.)
In Perl (assuming you have a local copy of UnicodeData.txt):
$dbfile = 'UnicodeData.txt';
open(DB,"<$dbfile") || die "Can't open database file $dfile $!";
while(<DB>) {
@entry = split(';',$_);
if($entry[7]) {
print $entry[0], " ", $entry[1], "\n"; }}
(The results, when using the current database, are at
http://www.cs.tut.fi/~jkorpela/unicode/digits.txt )
Depending on the programming environment, you might have a built-in
function for determining whether a character is a digit. The function may
or may not be up to date, i.e. correspond to the newest Unicode version.
Beware, however, that the isDigit function in java.lang.Character
tests for _decimal_ digits only (in the Unicode sense).
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Fri Jan 13 2006 - 15:13:38 CST