From: Jim Allan (jallan@smrtytrek.com)
Date: Fri Aug 15 2003 - 16:12:36 EDT
Jill Ramonsky posted:
> What I mean is, it seems (to me) that there is a HUGE semantic difference
> between the hexadecimal digit thirteen, and the letter D. 
Yes.
There is also a HUGE semantic difference between D meaning the letter D 
and Roman numeral D meaning 500.
But see http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf:
<< *Roman Numerals.* The Roman numerals can be composed of sequences of 
the appropriate Latin letters. Upper- and lowercase variants of the 
Roman numerals through 12, plus L, C, D, and M, have been encoded for 
compatibility with East Asian standards. >>
When the Unicode manual begins to talk anything being encoded for 
compatibility it usually means that it was *only* encoded for 
compatibility and otherwise would probably not have been encoded at all 
in Unicode because it is not needed.
Note that the chart at http://www.unicode.org/charts/PDF/U2150.pdf 
indictes compatibility decomposition of these characters to the regular 
Latin letters.
The letter _d_, though here lowercase, is also the symbol for _deci-_ in 
metric abbrevations. See 
http://www.geocities.com/Athens/Thebes/5118/metric.htm.
_D_ also often means "digital" as in _D/A_ "digital to analog" or 
_D-AMPS_ "Digital Advanced Mobile Phone System".
_D_ is listed at 
http://www.geocities.com/malaysiastamp/info/abbreviationd.html as 
meaning both "document" and "Pneumatic Post. Scott catalog number prefix 
to identify stamps other than standard postage".
If Unicode even distinguished some of these uses (and similar special 
uses for all letters in all scripts) by encoding them separately in 
Unicode, what purpose would be served? The viewers would still only see 
_D_ or _d_ as indeed they ought to, since that is what they should see 
according to normal orthography and spelling.
Most users would not enter the new proper characters in any case. Even 
now most fonts don't support the special Roman numeral characters, and 
there is no need to support them. The standard Roman letter glyphs are 
what are normally used.
Unicode doesn't attempt to distinguish meanings of symbols except when 
forced to by compatibility with older character sets or in a few cases 
where the same character in appearance is used sometimes as a "letter" 
and sometimes as "punctuation" so that applications can determine the 
proper beginnings and endings of words.
The semantics of the symbols is otherwise not Unicode's concern. Unicode 
should not define whether 302D is a hex number or a product identifier 
or a section identifier in a document or perhaps has some other meaning. 
Encoding "D" with a dfferent code won't help a reader of printed text 
(or even displayed text) to know what is meant. A copy typist may not 
know what is meant.
> I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit"
> which some Unicode characters possess. I may have missed it, but what I
> don't see in the charts is a mapping from characters having these property
> to the digit value that they represent. Is it assumed that the number of
> characters having the "Hex_Digit" properties is so small that implementation
> is trivial? That everyone knows it? Or have I just missed the mapping by
> looking in the wrong place? 
See http://www.unicode.org/Public/UNIDATA/PropList.txt:
<<
0030..0039    ; ASCII_Hex_Digit # Nd  [10] DIGIT ZERO..DIGIT NINE
0041..0046    ; ASCII_Hex_Digit # L&   [6] LATIN CAPITAL LETTER A..LATIN 
CAPITAL LETTER F
0061..0066    ; ASCII_Hex_Digit # L&   [6] LATIN SMALL LETTER A..LATIN 
SMALL LETTER F
# Total code points: 22
 >>
The property ASCII_Hex_Digit is a convenience to allow applications to 
identify one common use of "A", "B", "C", "D", "E" and "F" in accordance 
with defined properites set out in some programming langauges.
In fact it has also become common when using bases greater than 16 to 
extend this convention so that one can have such a number as AW3Zāā in 
base-36 notation.
To indicate hex numbers a subscripted base indicator or a leading "&H" 
or the word "hex" or some other indicator of meaning is far more useful 
to humans than a double encoding of the same characters according to 
meaning.
If you can't normally see the difference in text then Unicode normally 
shouldn't encode any difference.
Jim Allan
This archive was generated by hypermail 2.1.5 : Fri Aug 15 2003 - 16:52:18 EDT