From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Nov 11 2003 - 14:55:43 EST
Jill Ramonsky summarized:
> In summary then, suggestions which seem to cause considerably less
> objection than the Ricardo Cancho Niemietz proposal are:
> (1) Invent a new DIGIT COMBINING LIGATURE character, which allows you to
> construct any digit short of infinity
> (2) Use ZWJ for the same purpose
> (3) Invent two new characters BEGIN NUMERIC and END NUMERIC which force
> reinterpretation of intervening letters as digits
Actually, I don't think these cause considerably less objection.
They are simply suggestions which Philippe has made, and which
haven't been potshot sufficiently yet on the list.
I note that Philippe and you *have* reached consensus that you are
talking about extending the list of digits, and are not concerned
about Ricardo Cancho Niemietz's issue of fixed-width digit display.
> I'm going to ignore as irrelevant any suggestions as to how to make a
> sort work, and focus instead on how to make digits >9 work.
O.k. And I won't veer into any of the sorting issues, either.
> For all of
> these reasons, I don't think that ZWJ fits the bill, though I'd be happy
> to be convinced otherwise if my reasoning is flawed.
I don't think your reasoning is flawed here at all. The ZWJ is a
cursive control and ligation control. Its effect, if any, should
be on the *appearance* of neighboring characters. So if someone
decided, for example, that the "00" in 2003 looked better ligated
and wanted to create a font to do so, they could hint text with
a ZWJ to indicate when the sequence "00" should ligate into a
single glyph and when not. You can't expect generic support for that
kind of visual ligation to morph, on all the system platforms, into
a completely orthogonal concept of treating ligated digit sequences
as digits in their own right.
>
> The option of BEGIN NUMERIC and END NUMERIC is also a pretty good one,
> ... However, there is another reason why I don't think
> this is the best solution - it's not stateless. From a random point in a
> string, you'd have to parse backwards and forwards to figure out how to
> interpret everything. ...
I also concur with this argument. Creating new stateful controls
for this is a non-starter. If people want stateful sequence-spanning
attribute designations like this, they should accomplish it in XML
or something similar, which has this kind of apparatus built in.
> For all of these reasons, my preference is for DIGIT COMBINING LIGATURE.
This option fails for some of the same reasons as the use of ZWJ. It
doesn't have the problem of being a misapplication of an existing
format control character, so that it would be semantically clear.
But it has the same rendering issues. To quote your analysis for
ZWJ, mutatis mutandis:
> It would of course be COMPLETELY
> DISASTEROUS if the hex string "2F" were to be (correctly, in this
> scheme) represented as ('2' + '1' + DCL + '5') and then rendered as
> "215" by an unaware renderer.
... which it would be.
You could, of course, avoid this problem if the "DIGIT COMBINING LIGATURE"
were actually just a visible symbol, rather than an invisible
format control that would have dubious support in most platform
software. For example, you could simply make use of an
existing symbol and *define* it to be your {digit combining ligature}
symbol. Thus, for "2F", you could have, e.g.:
21¤5
where ¤ is defined as a digit composition operator, defaulting to
decimal digit composition. Thus:
21¤5<radix16> = 0x2F = 47
21¤5<radix36> = = 87
21¤5<radix97> = = 209
...
And 21¤5<radix8> is an error, digit out of range.
Note that to evaluate any such expression, you still need to know
the radix implicitly (or explicitly), just as 777 = 777 if the
radix is 10, but 0x777 = 1911 if the radix is 16.
The mathematically inclined out there could probably generalize
this scheme to allow any digit (composed or not) to be an operand
of the digit composition operator, for greater generality. And in
fact this seems such an obvious kind of approach to generalizing
the concept of "digit" that I'd be surprised if there wasn't already
a mathematical literature on the topic and some more or less
accepted mathematical symbology to deal with this.
Now the drawback of a mathematically defined approach to the problem
is that you couldn't really expect systems software to automatically
support digit formation and evaluation in such a scheme. But
aren't we really talking about specialized applications here, anyway?
I'm not hearing any groundswell of support here for a wonderful
idea that all the platform and library vendors and language
standardization committees have overlooked all these years in
supporting hex. Instead, any such scheme for extending digits has
to deal with the ground facts that hexadecimal *is* supported
already in those "oceans of data" and "rivers of code" already
mentioned earlier in the thread. That is is done by overloading
the semantics of A-F and a-f may displease the purists out there,
but it is still the case. Those oceans aren't going to dry up,
and those rivers are not going to be suddenly diverted. So that
leaves you once again in the position of advocating a specialized
mathematical application of digit extension. And for that, I don't
see any particular barrier to simply using existing characters to
devise an appropriate symbolic convention for the generalized
case, the way mathematicians have been behaving for centuries now.
>
> So it would seem I now have the choice of either contacting Ricardo and
> suggesting this alternative to him, or arguing against him and then
> submitting a counter-proposal. I don't know which approach is likely to
> be most productive.
As before, I don't see either approach as likely to gain any
traction in the encoding committees, given the scope of the problem
and the likelihood of complications if anything remotely like
'A'..'F' got encoded again explicitly as hex digits.
--Ken
This archive was generated by hypermail 2.1.5 : Tue Nov 11 2003 - 15:42:09 EST