Re: What constitutes "character"?

From: Dhrubajyoti Banerjee (dhrub@hotmail.com)
Date: Fri Nov 09 2001 - 07:48:59 EST


Hi,
   I joined in a bit late on this.

On Thu, 08 Nov 2001 Gaspar Sinai wrote :
>I think that the Indian sctipts deserve better character
>assignement -

The character assignment of Indian scripts is already quite well done. Since
it follows the ISCII88 standard, though, some ambiguities may remain
(contrary to that of ISCII91).

On Thu, 08 Nov 2001 Arjun Aggarwal wrote :
>If anybody on the list really thinks that they can submit any characters
>into Unicode then they should positively respond to my query. Help in this
>regard is much needed.

A lot of work from a lot of people has already gone into Unicode.
Some minute loopholes may remain which will be sorted out in time.
However the idea you present, of pushing half characters, does not sound
correct.
As some of the learned people on this list have already tried to show you,
there is a distinct difference between a 'character' and a 'glyph'(displayed
form of the character).
What Unicode or any Native code(like ISCII) consists of are characters, not
glyphs. If you have studied your Hindi(as I assume), you would have studied
your 'Barahkhadi' (the basic Devanagari alphabet primer)which ,as I know,
does not have any half characters.
All native Script writers may have their own different ways of visually
representing character conjuncts. One of the most common examples is that of
writing 'Rakt' ('blood' in Hindi).
'Rakt' consists of the characters
Consonant ra + Consonant Ka + Halant + Consonant ta.
This can be written in both the ways shown in the Jpeg file I have attached.
One of them is with the 'Kta' conjunct and the other is the 'Half-Ka and ta'
form.
This is why it is upto the application to decide how exactly to represent
the
Consonant Ka + Halant + Consonant ta form.
The underlying characters always remain
Consonant Ka + Halant + Consonant ta
so that Data Interchange and Data Storage is easier and unambiguous.
Which is what Unicode (or for that matter any character set) is all about.
Visually they may take any number of multiple forms.
Kindly read IS 13194:1991 ISCII document by the Bureau of Indian Standards
and the Unicode 3.0 document for a complete reference and understanding of
character codes and detailed understanding of factors specifically
pertaining to Indian scripts.

>If the Chinese characters and CJK extensions can have so many spaces
>devoted to them in the Unicode list , which runs into some 10 times as
>compared to the characters for many other languages, why can't the
>characters of other languages be taken up in Unicode, even if they require
>one block( of usually 64 characters) of characters more than other
>languages like >English.

Well, that is if those blocks are really needed.

regards,
Dhrubajyoti Banerjee

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp





This archive was generated by hypermail 2.1.2 : Fri Nov 09 2001 - 03:58:12 EST