Re: Re: Regd- ISCII to Unicode Converter!

From: Ram Viswanadha (ram@jtcsv.com)
Date: Tue Apr 02 2002 - 19:04:22 EST

Previous message: Markus Scherer: "Re: xml 1.0 and unicode ideograph ext a and ext b"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

INV (0xd9) code is used in ISCII for special display purposes in situations
where formation of composite characters requires a consonantal base and the
consonant itself is invisible. INV cannot be accurately represented in
Unicode
so we fallback to ZEN and fallbacks by definition cannot be round-tripped.

ISCII
======
1) KA+HALANT+INV => HALF KA (all half characters can
be represented by CONSONANT+HALANT+ZWJ in Unicode)
2) RA+HALANT+INV => RAsup (RA+HALANT+ZWJ is treated
as Eyelash RA which is may not be the desired effect)

The below stand alone forms of Vowel signs cannot be accurately represented
in Unicode.

3) INV+VOWEL SIGN I+NUKTA => VOWEL SIGN
                                                                   VOCALLIC
L
4) INV+HALANT+RA => RAsub
5) INV+ VOWEL SIGN => VOWEL SIGN
      VOCALLIC R +NUKTA VOCALLIC RR
6) INV+VOWEL SIGN II+NUKTA => VOWEL SIGN
                                                                   VOCALLIC
LL

Apple in their mapping tables maps the INV to LRM, and I think they use it
when rendering like if you have a LRM in middle of Indic codepoint stream
and it follows these rules then do something interesting. But I am not sure,
maybe someone from Apple may correct me.

Regards,

Ram
---------------------------------------------------
Ram Viswanadha
International Components For Unicode
GCoC San Jose
IBM

----- Original Message -----
From: "Markus Scherer" <markus.scherer@jtcsv.com>
To: "Ram Viswanadha" <ram@jtcsv.com>; <ramv@us.ibm.com>
Cc: <markus.scherer@us.ibm.com>
Sent: Tuesday, April 02, 2002 6:17 PM
Subject: Fwd: Re: Regd- ISCII to Unicode Converter!

> Ram, could you please respond to this to the unicode@unicode.org ?
> Thanks,
> markus
> -------- Original Message --------
> Date: Wed, 27 Mar 2002 17:17:13 +0800
> From: Federic Zhang <Federic.Zhang@Sun.COM>
> To: Markus Scherer <markus.scherer@jtcsv.com>
> CC: unicode <unicode@unicode.org>
> Subject: Re: Regd- ISCII to Unicode Converter!
>
>
> Hi Markus,
>
> From the ucnvisci.c code, seems that the sequence of "Halant INV"
(0xe8 0xd9) in Devanagari script would
> be converted to "U+094D ZWJ" in Unicode and becomes "Halant Nukta" (0xe8
0xe9) if convert back from
> Unicode to ISCII since the "U+094D ZWJ" would be treated as one soft
halant. Is it correct behavior?
>
> Regards,
> Federic
>
> Markus Scherer 写入：
>
> > ICU supports ISCII, except for the font-style attributes (like "bold")
which are not expressible in plain text.
> >
> >
http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/data/mappings/conv
rtrs.txt
> > http://oss.software.ibm.com/icu/
> >
> > ISCII is algorithmic. The mapping part to/from Unicode is fairly
straightforward because Unicode's encoding of Indic scripts is based on an
earlier version of ISCII.
> >
> > For details take a look at the source code:
> >
http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvisci.c
> >
> > markus
> >
> > Rajesh Chandrakar wrote:
> >
> > > Sorry to say that I lost the web address some one posted on forum
concerning
> > > to ISCII to Unicode Conversion. It would be highly appreciated, if
some one
> > > provides me. I wanted to check the conversion, how far it works?
>
>
>
>
>

Previous message: Markus Scherer: "Re: xml 1.0 and unicode ideograph ext a and ext b"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Apr 03 2002 - 20:01:40 EST