From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Nov 27 2003 - 13:56:24 EST
Arcane Jill wrote:
> It has been explained to me that the "decimal digit" property has the
> following meaning: "Decimal numbers are those using in decimal-radix
> number systems. In particular, the sequence of the ONE character
> followed by the TWO character is interpreted as having the value of
> twelve".
I don't agree with that explanation.
If I use isdigit() in c or a corresponding function in another language
to check a character, I only expect to find out whether to not that
character is or is not a decimal digit. I won't know whether it is being
used as part of a decimal-radix number or not.
> I mean, it's quite clearly ignored in sentences like "My phone number is
> 0044-1727-6000000", or "The codepoint of the space character is U+0020".
>
Many languages and applications allow use of a filter template such as
"9999-999-9999999" or "####-####-#######" in which the figure "9" or "#"
in the template must be filled by a decimal digit in the data.
Allowing *only* decimal digits (and additional template characters) in a
field is often useful. Keycodes and product codes often contain
particular positions that must be alphabetic and other positions where
only decimal numbers are allowed.
> What possible use could any mechanical algorithm make of the "decimal
> digit" property that it could not equally well make of the "digit" or
> "numeric" properties?
We hardly want to allow Roman numeral characters in a field that we are
going to evaluate as though it were decimal. If we are interpreting a
field as a radix 10 number it is reasonable to validate the field as
containing only radix 10 characters (and allowed numeric separators)
proceeded or followed by spaces.
Generally a check on whether a character is a decimal digit is part of
validation, whether validation of previously stored data or of data as
it is being input. Of course we will probably normally want a tighter
validation. We probably won't want to allow a number that is composed of
mixed Latin, Arabic and Hindu digits even though it can be evaluated.
On the other hand, in a multi-lingual and multi-script environment it
would be useful to ignore scripts in evaluating numbers just as one
often ignores case in evaluating strings. Note that data is often
supplied from a client in text format, say tab-delimited, with numbers
in text format. It would be useful to verify such data by checking that
the numbers are proper decimal numbers regardless of script before
actually reading the data into another database where they might (or
might not) be converted to binary format.
Checking for decimal numbers is also useful in parsing addresses which
is a necessity for address validation and address correction software.
Jim Allan
This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 15:28:13 EST