From: Jim Allan (jallan@smrtytrek.com)
Date: Mon Nov 10 2003 - 14:02:09 EST
Jim Ramonsky posted:
> I am not the one who has not thought it through. There _is_ no
> difference between decimal 7 and hex 7. They are the same digit. File777
> sorts before File999 in _ALL_ radices.
Exactly.
So mixed hex and mixed decimal will not sort or compare properly using a
natural sort *string* comparison even with creation of clones of the
alpha characters with numeric values.
Why then use a natural sort at all?
If you want a natural sort using a mixed alpha and numeric string which
may use multiple bases, a reasonable procedure might be to use the
Unicode subscript numbers as base markers.
Upon reaching one of these the parser evaluates the superscript digits
to create a decimal number and then goes backward until it comes to the
first non-digit according to that base identified by that decimal
number. Then it can simply zero extend for sort or comparison. Or a
binary value can be used for sort or comparison if required.
This solves for all bases up to base 36. Such a system would be
understood on sight by humans.
Or again, if hex number are the only issue, use some normal
hex-indication flag in the string so that both humans and the customized
natural sort will know that the number is hex and where the number
begins and ends, e.g. File-0x15A-19, File-oxB23A5-25,
File-ox123ABCD-Extra in which the center portion, between the two
hyphens, would be recognized as hex by the "0x" prefix.
Using symbols that the computer automatically distinguishes while human
beings do not is a *dangerous* solution to any problem. Enough typos are
made even when symbols are different. It is common in producing random
uppercase alpha / numeric codes to avoid 0, O, Q, 1, I, 5, S, 8, B, U, V
for that reason alone.
Now a completely new set of hex digits, as has been suggested, might
make sense. But that is not for Unicode to prescribe, but for
mathematical associations or perhaps some other computer standards
organization. If such a set of digits were proposed by international
organizations with very strong backing (comparable to introduction of
the Euro symbol) then they would certainly have a place in Unicode.
Or if a particular computer language were to introduce them in the PUA
for that language and that usage became popular, then again they would
be encoded by Unicode.
But one wants to avoid as much as possible symbols that look identical
to human beings but have radically different meanings. Unicode as enough
of those by necessity and for backward compatibility.
Jim Allan
This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 14:50:37 EST