From: Gregg Reynolds (unicode@arabink.com)
Date: Thu Jun 30 2005 - 10:13:36 CDT
N. Ganesan wrote:
> Gregg Reynolds (unicode@arabink.com) wrote
> 
>>You are not alone in thinking Unicode does not 
>>serve your language community, but don't forget 
>>it was never Unicode's intention to serve 
>>language communities. It's just a character 
>>encoding, not a language encoding. Unicode 
>>happens to also do serious damage to the entire
>>world of right-to-left languages such as Arabic (IMO), 
>>but it had no choice, given that it was constrained 
>>to adopt legacy encodings. No point in whining 
>>about that. And it is probably better than what
>>we had before.  Still, it is up to than language 
>>community to decide to do something better. 
> 
> 
> For resources and other practical difficulties,
> I think Unicode will be the only one 16-bit
> encoding for Tamil for a long time. Haven't even
> heard of someone coming up with competition.
> But Tamil script being a script with only 
> non-conjuncts (unlike eg. Devanagari or 
> Tamil Grantha scripts), many 8-bit glyph based 
> encodings still exist in the web. But
> they are not searcheable via Google & so on.
> So, some 500+ blogs operate exclusively
> in Unicode.
> 
> What about Arabic script? The Middle East
> awash with funds and resources, and the script is in a 
> wide area by lots of people. If
> "Unicode happens to also do serious damage 
> to the entire world of right-to-left languages",
> is there a competition? Any 16-bit encodings
> for Arabic script other than Unicode? 
Hi,
I'm not aware of any 16-bit encodings for Arabic other then Unicode. 
There are plenty of 7 or 8 bit encoding and transliteration schemes, but 
most of them use more or less the same character repertoire as Unicode. 
   (Note that ascii-based transliteration schemes don't bother with 
bidirectionality of number strings but have been quite useful, at least 
to the scholarly community, for a long time.)   256 characters is 
adequate to cover Arabic completely, so 8 bits is enough.
The reason (or my reason, anyway) for experimenting with alternative 
encoding designs is not because Unicode is incapable of encoding the 
graphic forms of text, but because it rules out some kinds of 
"grammatical" semantics (for lack of a better term) that can easily be 
associated with characters, and that allow for much more powerful text 
processing.  For example, traditional Arabic grammar distinguishes many 
different "kinds" of alef.  They all use the alef letterform encoded by 
Unicode, but they have different functions, some graphotactic, some 
phonological, maybe some others.  Obviously they could all be encoded 
with different codepoints that use the same glyph; just as obviously 
this would be outside the scope of Unicode.  However there are other 
cases where the dividing line is not so clear.  The fun thing about 
Arabic is that various kinds of grammatical semantics can be attached to 
single characters; you can't really do that in English.
(By the way, there's the real contrary to plaintext: character codes 
that denote grammatical semantics rather than just graphemic semantics.)
In any case, by piggy-backing on a widely implemented encoding like 
latin-1, you can encode text using an experimental design and use 
existing tools to work with it in various ways, make it available to 
others, etc., so you can find out what really works and is useful to 
others, rather than speculating.  So unproductive polemics on the 
Unicode list can be avoided.  ;)
Thanks,
gregg
This archive was generated by hypermail 2.1.5 : Thu Jun 30 2005 - 10:16:41 CDT