Re: The real solution

From: James Kass (jameskass@worldnet.att.net)
Date: Sun Nov 25 2001 - 20:14:53 EST


Hello Arjun,

There are (at least) two possible solutions.

One is to upgrade the operating system and work with the existing
international encoding standard. The new OS support for Indic
scripts works well.

The other option, for those unable to upgrade and who must continue
to use older equipment and software, is to make a tool to convert
existing encodings, like the Shusha font, to Unicode and back. In
this way, you could make a page at your office using your existing
font encoding and input methods, and simply convert it to Unicode
before posting it to the web or sending it to someone else. What
the other person has to do in order to display it is the other person's
responsibility. The other person might have a new OS and be able to
display it at once, or the other person might have to run their own
conversion to display your material in a non-Standard encoding.

Mark Leisher has worked on a PERL script for doing just that with
the Naidunia Devanagari font. More info can be found at:
http://crl.nmsu.edu/~mleisher/devnag.html

One of the big problems with using a font like Shusha is that the
encodings aren't standard and are subject to change. A few years
ago I made a chart using the Shusha font, sent it to someone else
who also had the Shusha font, and they couldn't read the page!
This is because there are several versions of the Shusha font on
the internet, and they don't all match.

This problem is even more extreme if you have Shusha font and
I have Naidunia font. The glyph for the half letter BA is #231
in Naidunia, but it is #98 in Shusha and Xdvng fonts. Even the
Shusha and Xdvng fonts don't always match, though.

For Devanagari letter SHA, Shusha is #207+#97, Xdvng is #83+#97,
and Naidunia is #78.

If you are interested in making a conversion tool for Shusha, the
following might be helpful. This is based on Shusha font for Hindi,
version 1.5, dated 25 May 97. (*Important* check the version of
your copy of Shusha font before using this data!)

Here, in decimal values, are the Unicode(s), Shusha number(s), and
a name for the glyph. The glyph names aren't standard, they are
just used internally here. The decimal Unicodes come first, followed
by a semi-colon and an "S", then the Shusha numbers and glyph name.

 2305 ; S 208 (CBINDU)
 2306 ; S 77 (ANUSVARA)
 2307 ; S 193 (VISARGA)
 2309 ; S 65 (A)
 2310 ; S 65 97 (AA)
 2311 ; S 91 (I)
 2312 ; S 154 (II)
 2313 ; S 93 (U)
 2314 ; S 125 (UU)
 2315 ; S 63 (VR IND)
 2316 ; S 59 (VL IND)
 2317 ; S 101 94 (CANDRA E)
 2318 ; S 246 (SHORT E)
 2319 ; S 101 (E)
 2320 ; S 101 111 (AI)
 2321 ; S 65 97 94 (CANDRA O)
 2322 ; S 32 (SHORT O)
 2323 ; S 65 97 111 (O)
 2324 ; S 65 97 79 (AU)
 2325 ; S 107 (KA)
 2325 2381 2325 ; S 149 (K-KA)
 2325 2381 2340 ; S 62 (K-TA)
 2325 2381 2352 ; S 203 (K-RA)
 2325 2381 2357 ; S 149 (K-VA)
 2325 2381 2359 ; S 120 97 (K-SSA)
 2325 2381 2359 2381 8205 ; S 120 (K-SS-)
 2325 2381 8205 ; S 64 (K-)
 2326 ; S 75 (KHA)
 2326 2381 2352 ; S 75 96 (KH-RA)
 2326 2381 8205 ; S 35 (KH-)
 2327 ; S 103 97 (GA)
 2327 2381 8205 ; S 103 (G-)
 2328 ; S 71 97 (GHA)
 2328 2364 ; S 32 (GHA*)
 2328 2381 8205 ; S 71 (GH-)
 2329 ; S 61 (NGA)
 2329 2381 2325 ; S 32 (NG-KA)
 2329 2381 2325 2381 2359 ; S 32 (NG-K-SSA)
 2329 2381 2326 ; S 32 (NG-KHA)
 2329 2381 2327 ; S 246 (NG-GA)
 2329 2381 2328 ; S 32 (NG-GHA)
 2329 2381 2329 ; S 32 (NG-NGA)
 2329 2381 2344 ; S 32 (NG-NA)
 2329 2381 2350 ; S 32 (NG-MA)
 2329 2381 2351 ; S 32 (NG-YA)
 2329 2364 ; S 32 (NGA*)
 2329 2364 2381 ; S 32 (NGA*X)
 2329 2381 ; S 32 (NGX)
 2330 ; S 99 97 (CA)
 2330 2381 2330 ; S 32 (C-CA)
 2330 2381 2330 2381 8205 ; S 32 (C-C-)
 2330 2364 ; S 32 (CA*)
 2330 2381 8205 ; S 99 (C-)
 2331 ; S 67 (CHA)
 2331 2381 2351 ; S 32 (CH-YA)
 2331 2364 ; S 32 (CHA*)
 2331 2364 2381 ; S 32 (CHA*X)
 2331 2381 ; S 32 (CHX)
 2332 ; S 106 97 (JA)
 2332 2381 2332 ; S 32 (J-JA)
 2332 2381 2332 2381 8205 ; S 32 (J-J-)
 2332 2381 2334 ; S 38 (J-NYA)
 2332 2381 2334 2381 8205 ; S 32 (J-NY-)
 2332 2381 2352 ; S 32 (J-RA)
 2332 2381 8205 ; S 106 (J-)
 2333 ; S 74 97 (JHA)
 2333 2381 2352 ; S 32 (JH-RA)
 2333 2364 ; S 32 (JHA*)
 2333 2381 8205 ; S 74 (JH-)
 2334 ; S 72 97 (NYA)
 2334 2381 2330 ; S 32 (NY-CA)
 2334 2381 2330 2381 8205 ; S 32 (NY-C-)
 2334 2381 2332 2381 8205 ; S 32 (NY-J-)
 2334 2381 2332 ; S 32 (NY-JA)
 2334 2364 ; S 32 (NYA*)
 2334 2381 8205 ; S 72 (NY-)
 2335 ; S 84 (TTA)
 2335 2381 2325 ; S 32 (TT-KA)
 2335 2381 2335 ; S 43 (TT-TTA)
 2335 2381 2336 ; S 149 (TT-TTHA)
 2335 2381 2351 ; S 32 (TT-YA)
 2335 2364 ; S 32 (TTA*)
 2335 2364 2381 ; S 32 (TTA*X)
 2335 2381 ; S 32 (TTX)
 2336 ; S 122 (TTHA)
 2336 2381 2336 ; S 123 (TTH-TTHA)
 2336 2381 2351 ; S 32 (TTH-YA)
 2336 2364 ; S 32 (TTHA*)
 2336 2364 2381 ; S 32 (TTHA*X)
 2336 2381 ; S 32 (TTHX)
 2337 ; S 68 (DDA)
 2337 2381 2327 ; S 246 (DD-GA)
 2337 2381 2328 ; S 32 (DD-GHA)
 2337 2381 2337 ; S 32 (DD-DDA)
 2337 2381 2338 ; S 32 (DD-DDHA)
 2337 2381 2350 ; S 32 (DD-MA)
 2337 2381 2351 ; S 32 (DD-YA)
 2337 2364 2381 ; S 32 (DDDHAX)
 2337 2381 ; S 32 (DDX)
 2338 ; S 90 (DDHA)
 2338 2381 2338 ; S 32 (DDH-DDHA)
 2338 2381 2351 ; S 32 (DDH-YA)
 2338 2364 2381 ; S 32 (RHAX)
 2338 2381 ; S 32 (DDHX)
 2339 ; S 78 97 (NNA)
 2339 2364 ; S 32 (NNA*)
 2339 2381 8205 ; S 78 (NN-)
 2340 ; S 116 (TA)
 2340 2381 2340 ; S 60 97 (T-TA)
 2340 2381 2340 2381 8205 ; S 60 (T-T-)
 2340 2381 2352 ; S 126 (T-RA)
 2340 2381 2352 2381 8205 ; S 32 (T-R-)
 2340 2364 ; S 32 (TA*)
 2340 2381 8205 ; S 37 (T-)
 2341 ; S 113 97 (THA)
 2341 2364 ; S 32 (THA*)
 2341 2381 8205 ; S 113 (TH-)
 2342 ; S 100 (DA)
 2342 2381 2327 ; S 246 (D-GA)
 2342 2381 2328 ; S 32 (D-GHA)
 2342 2381 2342 ; S 95 (D-DA)
 2342 2381 2342 2381 2352 ; S 32 (D-D-RA)
 2342 2381 2342 2381 2357 ; S 32 (D-D-VA)
 2342 2381 2343 ; S 119 (D-DHA)
 2342 2381 2348 ; S 32 (D-BA)
 2342 2381 2349 ; S 32 (D-BHA)
 2342 2381 2350 ; S 32 (D-MA)
 2342 2381 2351 ; S 86 (D-YA)
 2342 2381 2352 ; S 32 (D-RA)
 2342 2381 2357 ; S 87 (D-VA)
 2342 2364 ; S 32 (DA*)
 2342 2364 2381 ; S 32 (DA*X)
 2342 2381 ; S 32 (DX)
 2343 ; S 81 97 (DHA)
 2343 2381 2344 ; S 32 (DH-NA)
 2343 2381 2344 2381 8205 ; S 32 (DH-N-)
 2343 2364 ; S 32 (DHA*)
 2343 2381 8205 ; S 81 (DH-)
 2344 ; S 110 97 (NA)
 2344 2381 2344 ; S 217 (N-NA)
 2344 2381 8205 ; S 110 (N-)
 2345 ; S 32 (NNNA)
 2346 ; S 112 (PA)
 2346 2381 2352 ; S 112 96 (P-RA)
 2346 2364 ; S 32 (PA*)
 2346 2381 8205 ; S 80 (P-)
 2347 ; S 102 (PHA)
 2347 2381 2352 ; S 205 (PH-RA)
 2347 2381 8205 ; S 70 (PH-)
 2348 ; S 98 97 (BA)
 2348 2364 ; S 32 (BA*)
 2348 2381 8205 ; S 98 (B-)
 2349 ; S 66 97 (BHA)
 2349 2364 ; S 32 (BHA*)
 2349 2381 8205 ; S 66 (BH-)
 2350 ; S 109 97 (MA)
 2350 2364 ; S 32 (MA*)
 2350 2381 8205 ; S 109 (M-)
 2351 ; S 121 97 (YA)
 2351 2381 8205 ; S 121 (Y-)
 2352 ; S 114 (RA)
 2352 2381 2367 ; S 32 (RI (*1)
 2352 2369 ; S 201 (RU)
 2352 2370 ; S 36 (RUU)
 2352 2381 8205 ; S 32 ('lash RA)
 2353 ; S 32 (RRA)
 2354 ; S 32 (LA)
 2354 ; S 108 97 (LA)
 2354 2381 2354 ; S 32 (L-LA)
 2354 2364 ; S 32 (LA*)
 2354 2381 8205 ; S 108 (L-)
 2355 ; S 76 (LLA)
 2355 2381 8205 ; S 149 (LL-)
 2356 ; S 32 (LLLA)
 2357 ; S 118 97 (VA)
 2357 2364 ; S 32 (VA*)
 2357 2381 8205 ; S 118 (V-)
 2358 ; S 207 97 (SHA)
 2358 2381 2330 2381 ; S 32 (SH-C-)
 2358 2381 2330 ; S 32 (SH-CA)
 2358 2381 2352 ; S 69 97 (SH-RA)
 2358 2381 2352 2381 8205 ; S 69 (SH-R-)
 2358 2381 2357 ; S 32 (SH-VA)
 2358 2381 2357 2381 8205 ; S 32 (SH-V-)
 2358 2364 ; S 32 (SHA*)
 2358 2381 8205 ; S 207 (SH-)
 2359 ; S 89 97 (SSA)
 2359 2381 2335 ; S 149 (SS-TTA)
 2359 2381 2336 ; S 149 (SS-TTHA)
 2359 2364 ; S 32 (SSA*)
 2359 2381 8205 ; S 89 (SS-)
 2360 ; S 115 97 (SA)
 2360 2381 2344 ; S 32 (S-NA)
 2360 2381 2344 2381 ; S 32 (S-N-)
 2360 2381 2352 ; S 32 (S-RA)
 2360 2364 ; S 32 (SA*)
 2360 2381 8205 ; S 115 (S-)
 2361 ; S 104 (HA)
 2361 2381 2339 ; S 149 (H-NNA)
 2361 2381 2344 ; S 149 (H-NA)
 2361 2381 2350 ; S 149 (H-MA)
 2361 2381 2351 ; S 40 (H-YA)
 2361 2381 2352 ; S 149 (H-RA)
 2361 2381 2354 ; S 149 (H-LA)
 2361 2381 2357 ; S 149 (H-VA)
 2361 2364 ; S 32 (HA*)
 2361 2364 2381 ; S 32 (HA*X)
 2361 2369 ; S 32 (HU)
 2361 2370 ; S 32 (HUU)
 2361 2371 ; S 41 (HR)
 2361 2381 ; S 42 (H-)
 2361 2381 ; S 32 (HX)
 2364 ; S 44 (NUKTA)
 2365 ; S 124 (AVAGRAHA)
 2366 ; S 97 (AA)
 2367 ; S 105 (I)
 2368 ; S 73 (II)
 2369 ; S 117 (U)
 2370 ; S 85 (UU)
 2371 ; S 82 (VR)
 2372 ; S 32 (VRR)
 2373 ; S 94 (CANDRA E)
 2374 ; S 32 (SHORT E)
 2375 ; S 111 (E)
 2376 ; S 79 (AI)
 2377 ; S 32 (CANDRA O)
 2378 ; S 32 (SHORT O)
 2379 ; S 32 (O)
 2380 ; S 32 (AU)
 2381 ; S 92 (VIRAMA)
 2384 ; S 33 (OM)
 2385 ; S 32 (UDATTA)
 2386 ; S 32 (ANUDATTA)
 2387 ; S 32 (GRAVE)
 2388 ; S 32 (ACUTE)
 2392 ; S 32 (QA)
 2392 2381 8205 ; S 32 (Q-)
 2393 ; S 211 (KHHA)
 2393 2381 8205 ; S 32 (KHH-)
 2394 ; S 103 97 44 (GHHA)
 2394 2381 8205 ; S 32 (GHH-)
 2395 ; S 106 97 44 (ZA)
 2395 2381 2352 ; S 32 (Z-RA)
 2395 2381 8205 ; S 32 (Z-)
 2396 ; S 32 (DDDHA)
 2397 ; S 32 (RHA)
 2398 ; S 212 (FA)
 2398 2381 2352 ; S 32 (F-RA)
 2398 2381 8205 ; S 32 (F-)
 2399 ; S 32 (YYA)
 2400 ; S 32 (VRR IND)
 2401 ; S 32 (VLL IND)
 2402 ; S 32 (VL)
 2403 ; S 32 (VLL)
 2404 ; S 46 (DANDA)
 2405 ; S 46 46 (DBLDANDA)
 2406 ; S 48 (ZERO)
 2407 ; S 49 (ONE)
 2408 ; S 50 (TWO)
 2409 ; S 51 (THREE)
 2410 ; S 52 (FOUR)
 2411 ; S 53 (FIVE)
 2412 ; S 54 (SIX)
 2413 ; S 55 (SEVEN)
 2414 ; S 56 (EIGHT)
 2415 ; S 57 (NINE)
 2416 ; S 32 (ABBR.)
 9676 ; S 32 (DOTTEDCIRC)

This data is just a framework, of course. When you look at Mark Leisher's
work and notes, you'll see that there is additional processing required
besides just knowing the equivalent character numbers.

Hope this is helpful.

Best regards,

James Kass.

----- Original Message -----
From: "Arjun Aggarwal" <mrasool@sancharnet.in>
To: <unicode@unicode.org>
Sent: Friday, February 25, 2005 9:33 AM
Subject: The real solution

Hi Everybody

I am writing this letter as an extension to my earlier ones.

This time i will raise the old issue with a new perspective and with more practicality. The Unicode encoding is meant to help people
around the world to use characters in their own language besides English. But , unfortunately this is not the case for Hindi , the
third largest spoken language of the world( spoken in around 10 countries). This is so because the Unicode encoding for the
Devnagari script has failed to do just this. The script that is used for writing this language , i.e. Hindi and around 8 other
languages.

Here i am not concerned about the font developers (no offence intended) but the people who want to use it for everyday purposes.
This is the reason why you will not have seen any Hindi web pages and other applications written in Unicode or using Unicode code
mappings(except a few test pages by some enterprising individuals ).On the contrary i have seen thousands os applications and web
pages using Unicode for Chinese and Japanese inspite of these language scripts requiring large use of the Unicode encoding space.Is
this some kind of conspiracy to keep the use of Indic scripts from the Unicode system to the minimal. This is because the Unicode
system does not provide means and ways to display and even more importantly store characters for Devnagari in the way they should
be(and the way in which they are used).

And as the issue of transliteration was raised by many members of this list , i would like to say that it is not difficult to build
an engine for the same using half characters.This misconception is based on the fact that the iSCII syatem has been used for
building Unicode for Devnagari and other Indic scripts.ISCII is not at all a perfect system.The reservation that a non-ISCII system
will not be able to transliterate is highly ill gotten.Nothing can be further from the truth.

The people want a script system that they can use for every purpose ( displaying , encoding new fonts, databases and every other
purpose under the sun) and not just for displaying characters (which unfortunately is left to the mercy of the OS manufacturers) ,
even this function of it being not used properly.

I am not trying to degrade the people that built up the Unicode system .What i am trying to say here is that the system used
presently is highly out of sync with what it should be.

Anybody wanting to see extensive usage of the Unicode standard for Devnagari , will have to help me put it in a way that is
actually needed and is very much different from the ISCII form that is being used today.

If anybody wants to see how the Devnagari encoding of Unicode should actually look like , they can visit http://www.bharatbhasha.com
and download a font named Shusha .If they are not able to do this they can send me a private e-mail at mrasool@sancharnet.in and i
will send them the font file for Windows in an attachment.
The above mentioned mentioned font has not been developed by me and therefore should not be confused as a promotion through this
forum.

With Regards
Arjun Aggarwal
mrasool@sancharnet.in



This archive was generated by hypermail 2.1.2 : Sun Nov 25 2001 - 20:01:14 EST