Re: What constitutes "character"? New Problem

From: James Kass (jameskass@worldnet.att.net)
Date: Wed Nov 21 2001 - 03:39:11 EST

Previous message: James Kass: "Re: Unicode surrogates in browsers for the compelling demo"
Next in thread: James Kass: "Re: What constitutes "character"? New Problem"
Reply: James Kass: "Re: What constitutes "character"? New Problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,

Tex Texin has a demo page up to display names of celebrities
from around the world in their native scripts.

(Please see:
http://www.geocities.com/i18nguy/unicode-example.html )

I wonder if you're familiar with ISCII, the Indian national
computer encoding standard upon which the Indic script
encoding in Unicode is based.

One of the advantages of following this scheme is supposed
to be the ability to easily transliterate between various
Indic scripts.

Just to see how easy this was and if it works, took the name
"Madhari Dixit" from Tex Texin's page as submitted by Yaap
Raaf.

Got the decimal code points for each of the Devanagari characters
and put them in a database. Made nine copies of that database,
each time adding the number 128 to the code point values.
Then merged the ten databases into one and generated a
text HTML file.

The results follow in UTF-8:

माधुरी दिछित
মাধুরী দিছিত
ਮਾਧੁਰੀ ਦਿਛਿਤ
માધુરી દિછિત
ମାଧୁରୀ ଦିଛିତ
மா஧ுரீ ஦ி஛ித
మాధురీ దిఛిత
ಮಾಧುರೀ ದಿಛಿತ
മാധുരീ ദിഛിത
ථ඾ටශධව ඦ඿ඛ඿ඤ

Well, I don't have all the fonts needed here, but, except from
the Tamil (which lacks some consonants) and the Sinhala (which
I can't see at all), it looks to work and it's pretty easy to do.

The Indian committees responsible for the ISCII standard
obviously put a great deal of thought and effort into the job.

If half letters were encoded separately for Devanagari, people
have noted on this list that existing applications would be broken.
This ability to easily transliterate would be the first to go away.
Searching and indexing would probably be the next.

Hoping this is helpful.

Best regards,

James Kass.

----- Original Message -----
From: "Arjun Aggarwal" <mrasool@sancharnet.in>
To: <jameskass@worldnet.att.net>
Cc: <unicode@unicode.org>
Sent: Sunday, November 18, 2001 7:54 AM
Subject: Re: What constitutes "character"? New Problem

Previous message: James Kass: "Re: Unicode surrogates in browsers for the compelling demo"
Next in thread: James Kass: "Re: What constitutes "character"? New Problem"
Reply: James Kass: "Re: What constitutes "character"? New Problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Nov 21 2001 - 03:48:49 EST