From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Nov 27 2003 - 13:56:24 EST
Arcane Jill wrote:
> It has been explained to me that the "decimal digit" property has the
> following meaning: "Decimal numbers are those using in decimal-radix
> number systems. In particular, the sequence of the ONE character
> followed by the TWO character is interpreted as having the value of 
> twelve". 
I don't agree with that explanation.
If I use isdigit() in c or a corresponding function in another language 
to check a character, I only expect to find out whether to not that 
character is or is not a decimal digit. I won't know whether it is being 
used as part of a decimal-radix number or not.
> I mean, it's quite clearly ignored in sentences like "My phone number is
> 0044-1727-6000000", or "The codepoint of the space character is U+0020".
>
Many languages and applications allow use of a filter template such as 
"9999-999-9999999" or "####-####-#######" in which the figure "9" or "#" 
in the template must be filled by a decimal digit in the data.
Allowing *only* decimal digits (and additional template characters) in a 
field is often useful. Keycodes and product codes often contain 
particular positions that must be alphabetic and other positions where 
only decimal numbers are allowed.
> What possible use could any mechanical algorithm make of the "decimal
> digit" property that it could not equally well make of the "digit" or
> "numeric" properties? 
We hardly want to allow Roman numeral characters in a field that we are 
going to evaluate as though it were decimal. If we are interpreting a 
field as a radix 10 number it is reasonable to validate the field as 
containing only radix 10 characters (and allowed numeric separators) 
proceeded or followed by spaces.
Generally a check on whether a character is a decimal digit is part of 
validation, whether validation of previously stored data or of data as 
it is being input.  Of course we will probably normally want a tighter 
validation. We probably won't want to allow a number that is composed of 
mixed Latin, Arabic and Hindu digits even though it can be evaluated.
On the other hand, in a multi-lingual and multi-script environment it 
would be useful to ignore scripts in evaluating numbers just as one 
often ignores case in evaluating strings. Note that data is often 
supplied from a client in text format, say tab-delimited, with numbers  
in text format.  It would be useful to verify such data by checking that 
the numbers are proper decimal numbers regardless of script before 
actually reading the data into another database where they might (or 
might not) be converted to binary format.
Checking for decimal numbers is also useful in parsing addresses which 
is a necessity for address validation and address correction software.
Jim Allan
This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 15:28:13 EST