From: Jill Ramonsky (Jill.Ramonsky@Aculab.com)
Date: Tue Nov 11 2003 - 10:11:25 EST
Lots of useful and sensible opinions to which to reply, quoted below.
I'll try to reply to all of them at once.
In summary then, suggestions which seem to cause considerably less
objection than the Ricardo Cancho Niemietz proposal are:
(1) Invent a new DIGIT COMBINING LIGATURE character, which allows you to
construct any digit short of infinity
(2) Use ZWJ for the same purpose
(3) Invent two new characters BEGIN NUMERIC and END NUMERIC which force
reinterpretation of intervening letters as digits
I infer some confusion among contributors to this thread, some of whom
are still talking to me as though I'm only interested in a sort
algorithm and nothing else. I thought I'd made it clear that that was
merely an insignificant example of a more general overall concept, so
I'm going to ignore as irrelevant any suggestions as to how to make a
sort work, and focus instead on how to make digits >9 work.
To address Peter's question, "why not just use ZWJ"?, the answer is
partly ignorance, and partly concern over how a high-digit-unaware
renderer would handle things. It would of course be COMPLETELY
DISASTEROUS if the hex string "2F" were to be (correctly, in this
scheme) represented as ('2' + '1' + ZWJ + '5') and then rendered as
"215" by an unaware renderer. I would also be concerned about ambiguity.
I'd want the combined character to be unambiguously a single digit with
a computable value. Ignorance came into play also because I just didn't
realise you could do that with ZWJ, and I'm not convinced that ('1' +
ZWJ + '5') would be universally understood as the hex digit we normally
write as F. I guess I see the option of DIGIT COMBINING LIGATURE as
maybe a bit like FRACTION SLASH, in that it makes /clear/ that the thing
you are composing is a number (a digit, in the case of DIGIT COMBINING
LIGATURE, and a fraction in the case of FRACTION SLASH). The existence
of DIGIT COMBINING LIGATURE would also give us a place in the code
charts where its exact usage algorithm could be specified. For all of
these reasons, I don't think that ZWJ fits the bill, though I'd be happy
to be convinced otherwise if my reasoning is flawed.
The option of BEGIN NUMERIC and END NUMERIC is also a pretty good one,
and has the staggering backward compatibility property that if the hex
string "2F" were to be (correctly, in this scheme) represented as (BEGIN
NUMERIC + '2' + 'F' + END NUMERIC) it would be rendered as "2F" by an
unaware renderer, which is of course, perfect. It does have the
/dis/advantage, however, that there appears to be no way to specify in
the existing code charts what the numeric value of a given letter ought
to be. For example, how should a hex-aware interpretter interpret (BEGIN
NUMERIC + 'j' + END NUMERIC)? This is still a good option, of course,
but it would need to supplemented by an additional code chart. This is
because everything between BEGIN NUMERIC and END NUMERIC would have
different properties. However, there is another reason why I don't think
this is the best solution - it's not stateless. From a random point in a
string, you'd have to parse backwards and forwards to figure out how to
interpret everything. It also creates problems for concatenation and
substringing. What's more, it perpetuates the appallingly monstrous meme
that the /case/ of hex "2F" is somehow important, when in fact we should
be clear that all digits are caseless, and that the /apparent/ case of
digits ten to fifteen is merely an artifact.
Finally, there's Mark's observation that there may be some legitimate
use for digits >15.
For all of these reasons, my preference is for DIGIT COMBINING LIGATURE.
So it would seem I now have the choice of either contacting Ricardo and
suggesting this alternative to him, or arguing against him and then
submitting a counter-proposal. I don't know which approach is likely to
be most productive.
Jill
> -----Original Message-----
> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
> Another solution could be a formatting control that overrides the
> interpretation of a sequence of characters as digits rather
> than as letters
> Here I just suggested a few things for your problem of natural sort or
> semantic analysis, but I don't need it and I won't defend
> this idea. It's up
> to you to defned your opinion and make an alternate proposal for WG2.
> Clearly you take your distance from the other very
> problematic proposal to
> encode figure-width letters...
> -----Original Message-----
> From: Peter Kirk [mailto:peterkirk@qaya.org]
> So, Jill, could you get much of what you want by encoding your hex
> digits as ligatures between regular digits, e.g. <U+0031, ZWJ,
> U+0030...0035>? They would have the properties of digits, and
> could be
> tailored for collation, as contractions, where you need them. I'm not
> sure why you suggest a special DIGIT COMBINING LIGATURE, why not just
> use ZWJ?
> -----Original Message-----
> From: Mark E. Shoulson [mailto:mark@kli.org]
> If/when Tengwar gets coded, it will have digits for 10 and 11, as it
> uses base-12.
> I would say that to the extent that all this is a
> good idea, we
> shouldn't code lots of different ones (A,B for the computer
> crowd, X,E
> for the Dozenal crowd); let glyph-variants handle it.
> (as an oddball addition: if the maximum base we're really trying to
> support is 16, it might be handy to have a "16" digit as well,
>
> ~mark
This archive was generated by hypermail 2.1.5 : Tue Nov 11 2003 - 11:14:20 EST