From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 01 2005 - 20:21:58 CDT
> I assumed that "inherent" Arabic bidirectionality was
> invented in the wee hours of computer history, maybe in the early
> sixties, so it never occurred to me that anybody on this list might take
> it personally.
Dear me, unexamined presuppositions can be a problem, can't they? ;)
Visual order Arabic and Hebrew implementations on computers were
probably "invented" in the 70's, and saw fairly widespread use
in that timeframe on mainframes and later in the 80's on PC's. A
lot of that work was done by IBM. An inherent bidirectionality
algorithm was invented at Xerox PARC in the 80's, I think, although
others might have had an earlier hand in it. It was implemented
on the Xerox Star system in that timeframe. You can see it
discussed in Joe Becker's 1984 Scientific American article, for
example. And that was the immediate precursor of Arabic and Hebrew
support on the Macintosh, as well as the inspiration for the
Unicode bidirectional algorithm.
[Some historians on the list can, no doubt, nail this stuff down
more precisely...]
> I really do
> not understand the assertions that e.g. rtl digits would be a big
> problem, for reasons that I've explained on other messages. Which makes
> me think there's something I'm overlooking. That's all.
Yes, you are.
Cloning *any* common characters -- let alone all the digits, all
the common punctuation, and SPACE -- on the basis of directionality
differences, *would* wreak havoc on information processing. Many
of the characters in question are in ASCII, which means they
are baked into hundreds of formal languages, thousands of protocols
and 10's of thousands of programs and software systems. They have
been for decades now, and that *includes* Arabic and Hebrew
information processing systems.
Making the SPACE character in Arabic and Hebrew be something *other*
than U+0020 SPACE, simply because it might make bidirectional
editors easier to write if all characters were inherently RTL for
Arabic, would have the effect of breaking nearly all Arabic
and Hebrew information processing, deep down in the guts where
end users can't get at it. The *only* way around it would be to
introduce such things effectively all pre-deprecated with canonical
equivalences to the existing characters, so that at least normalized
data would behave correctly and be interpreted correctly. But then
there would be no supportable reason for introducing them in
the first place.
And you haven't thought through the consequences of having duplicated
digits with different directionality. You might think an end
user has complete control over what they do, with their keyboard
and their choice of characters -- but text is now *global* data,
and much of what goes on with data is automated, and consists
of programs talking to programs through protocols. Once you unleash
different users using what claims to be the *same* character
encoding, but with opposite conventions about *which* digits they
use and what direction those flow, you will inevitably get
into the situation where one process or another cannot reliably
tell whether "1234" is to be interpreted a 1234 or 4321. That alone
is enough for the whole proposal to be completely dead in the water.
All the proposal would accomplish is to create massive ambiguity
about what the representation of a given piece of Hebrew or
Arabic text should be -- and that is a *bad* thing in a character
encoding.
> Then again, I
> really do not understand why anybody would think RTL languages are
> inherently bidi, so maybe there's no point
Well, first of all, nobody has claimed that the Arabic *language*
is inherently bidi. Nor has anybody claimed that the Arabic *script*
is inherently bidi. So try understanding what the people implementing
these systems *are* claiming.
Any functional information processing system concerned with
textual layout that is aimed at the Hebrew or Arabic language
markets *must* support bidirectional layout of text. That is
simply a fact.
Furthermore, to do so interoperably -- that is, with the hope
that Implementation A by Company X will lay out the same underlying
text as Implementation B by Company Y in the same order, so that
a human sees and reads it as the "same" text -- they depend on
a well-defined encoding of the characters and a well-defined
bidirectional layout algorithm. One possible choice is consistent
visual ordering. One possible choice is consistent logical ordering
and an inherent bidirectional algorithm. The Unicode Standard
chose the latter, for a number of very good reasons. Trying
to mix the two is a quick road to hell.
--Ken
>
> -g
>
>
This archive was generated by hypermail 2.1.5 : Mon Aug 01 2005 - 20:24:54 CDT