-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello Arjun,
AA> This time i will raise the old issue with a new perspective and
AA> with more practicality.
Unfortunately, I'm sorry to say, not with added insight.
AA> This is the reason why you will not have seen any Hindi web pages
AA> and other applications written in Unicode or using Unicode code
AA> mappings(except a few test pages by some enterprising individuals
AA> ).
Oh, the difference is probably that from this category of pages, you
can cut&paste into Word without garbling up your data because it uses
a *standard* encoding as opposed to the complete chaos of Hindi web
pages using their own fonts. Does that count as justification for
Unicode?
AA> On the contrary i have seen thousands os applications and web
AA> pages using Unicode for Chinese and Japanese inspite of these
AA> language scripts requiring large use of the Unicode encoding
AA> space.
The connection is the other way round, actually.
The Unicode and ISCII representation of the Devanagari script and
other Brahmi-derived scripts is probably the most elegant way of
handling such a script so that operations which operate on the
*language* level (as opposed to the *glyph* level) such as intelligent
searching, spelling or grammar correction or all sorts of linguistic
applications can be performed _without_ having to heed the
idiosyncrasies of the encoding on the algorithm level.
In theory, it would have been possible to do this for Chinese or
especially Korean as well, but there it would be even more complicated
to implement a font, and data would have become a lot larger as
compared to unicode if one had to encode each character by radicals
(to say nothing of the incompatibility with existing standards, which
is not the case with Indian scripts)
AA> Is this some kind of conspiracy to keep the use of Indic scripts
AA> from the Unicode system to the minimal.
Do you really think that the sheer number of codepoints allocated to a
script in Unicode has got anything to do with any sort of valuing a
given script higher than another? It's surprising that anyone would
think so - then IPA, Dingbats and Box Drawing would probably be a lot
more "important" than Latin script, if your conspiracy theory was
correct.
AA> If anybody wants to see how the Devnagari encoding of Unicode
AA> should actually look like , they can visit
AA> http://www.bharatbhasha.com
You mean http://www.bharatbhasha.net, I think.
AA> and download a font named Shusha.
Shusha has the same disadvantage as most other idiosyncratically
encoded Hindi fonts: it wildly allocates glyphs to codepoints in an
order more or less derived from English. When you look at the page,
the underlying encoding for the word "Hindi" in Hindi is "ihndI" for
use with Shusha. The placement of the first "i" should make it pretty
clear why such an encoding is a Bad Thing (tm) if you want to do any
sort of language-level data processing because it has *nothing at all*
to do with the underlying language, it's just complicated because the
implementation is insufficient.
Greetings
Philipp mailto:uzsv2k@uni-bonn.de
__________________________
Seeing my great fault / Through darkening blue windows / I begin again
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (MingW32)
Comment: Freedom of the press is limited to those who own one.
iD8DBQE8AWCQAFQhKhQ6O0kRAhOyAKDef0nnMu1jhPU3XHVQQGkMW8jbxwCfeI2w
mlUmRY19AYQM7DwuKPP3QI8=
=Z83W
-----END PGP SIGNATURE-----
This archive was generated by hypermail 2.1.2 : Sun Nov 25 2001 - 17:03:49 EST