From: Gregg Reynolds (unicode@arabink.com)
Date: Tue Aug 02 2005 - 10:42:30 CDT
John Hudson wrote:
> Gregg Reynolds wrote:
>
>> Maybe its the size of the problem I'm not understanding. To take your
>> example, let's suppose that RTL digits 0-9 are approved tomorrow.
>> They're no different than their LTR equivalents, except for the
>> typesetting semantics. That is, they share the same "underlying
>> Platonic character", if I've understood you: they mean the number
>> three. They just have different *typographic* semantics.
>
>
> There is no concept of 'typographic semantics' in Unicode. (I'll leave
> it to the philosophers to debate whether the Unicode notion of 'abstract
> character' is the same as your 'underlying Platonic character'.)
>
Maybe "typographic" isn't the right word. It definitely if implicitly
encodes a set of graphical syntax rules. That's what the bidi classes,
shaping classes, combining classes, etc. encode.
> You are proposing encoding of separate Unicode characters for RTL
> digits. Ergo, two possible ways to encode each digit, and a major
Adding to the already existing - what, 5? 6? - different ways of
encoding each digit. Let's count the ways:
0030-0039 DIGIT ZERO etc
0660-0069 ARABIC-INDIC
06F0-06F9 EXTENDED ARABIC-INDIC
0966-096F DEVANAGARI
09E6-09EF BENGALI
0A66-0A6F GURMUKHI
0AE6-0AEF GUJARATI
Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Tibetan, Myanmar,
Ethiopic, Khmer, Mongolian, Limbu, Osmanya, various mathematical digit
characters, Japanese full-width, etc. etc. Twenty one and counting.
I don't see why adding additional sets of digits is problematic; Unicode
already accomodates it.
> rewrite of existing software (including updates to the cmap tables of
> all Arabic and Hebrew fonts) to ensure that these two sets of characters
> are treated as if they were the same characters for numeric searching
> and sorting. I don't see any way to do this that doesn't reimplementing
> a major aspect of RTL text processing from scratch, with attendant
> expense and wastage of previous work.
Depends on the architecture of the previous work. We already have the
necessary properties: Number and RTL. All you need to do is add
codepoints to your internal tables. Update a few cmaps. *If* you want
to support the new characters. That's not required, any more than
support for Thai line breaking is required for English language software.
More importantly, it makes it *much* easier to adapt LTR-only software
to support RTL languages. Not to support bidi processing, mind you.
That's the main benefit.
Maybe it would have been a good
> idea about fifteen years ago, but now it is an economic non-starter no
> matter what one thinks of the virtue of the idea itself.
Possibly; but nobody is required to implement new characters. Change is
never free; but things that never change never improve.
The idea is not to force vendors to support something they don't want to
do, it is to remove constraints preventing developers from doing
something they might like to do.
>
>> It is very clear to me that the only reason anybody uses such software
>> is because they have no other choice, not because they are satisfied
>> with it.
>
>
> So improve the software. Determine correct behaviour for specific
> characters and desired input methods and demand that applications get it
> right. Ripping out the foundations because you don't like the wallpaper
> doesn't make a lot of sense.
Oh believe me it's on my todo list. The `Patacode paper I posted a
while back is a start; a clear accounting of user interaction
expectations is part of the project, as is a formal discussion of digit
polarity in encoding design. Not to mention running code, which always
wins. To be honest I remain unconvinced that RTL digits would cause the
end of the world, or even much of a headache. Obviously I shall have to
hack a piece of free software to support RTL digits in the PUA, to
discover the actual costs, but it'll be a while before I get to that.
But that's a lot of work; the reason I bring up this stuff on this
thread is twofold: one, to get some idea of whether or not writing up a
formal proposal to submit to Unicode would be a waste of time (looks
like it); and two, to at least try to counteract the myths of inherent
RTL bidirectionality and the "necessity" of non-latin software to
support latin characters.
(BTW, the bidi requirement is hardly wallpaper; it *is* the foundation,
which is why it is harmful. But how does adding RTL digits amount to
ripping out the foundation? No changes would be made to existing
character semantics.)
-gregg
This archive was generated by hypermail 2.1.5 : Tue Aug 02 2005 - 10:43:28 CDT