Re: Telugu Unicode Encoding Review

From: Kiran Kumar Chava (chavakiran@gmail.com)
Date: Sat Oct 23 2010 - 07:50:24 CDT

  • Next message: Doug Ewell: "Re: Telugu Unicode Encoding Review"

    I spent some more time after going through feedback I received here and to
    my personal mail. I tried to divide discussion in to bullet points. where
    ever I have answer I put it in questions and answers section, where ever I
    don't have answer I put it up in open issues section. The result is another
    post on my blog. I am pasting the same here for reference. I had gone
    through related mails on racchabanda and Unicode.org. References are given
    in the end.

    This is a follow up post to my previous post titled Telugu Unicode Encoding
    Review (http://geek.chavakiran.com/archives/55 )

    *1. Anybody using reserved code points for private use?*

    a. No AFAIK. I haven’t seen any such instances. But the reason may be lack
    of serious computing work in Telugu language, apart from whatever is
    happening on web and PC. Once people start using Unicode for mobile, once
    people start using Unicode for book publishing (now this area is dominated
    by dynamic encoding from Anu) we may get into this hell.

    *2. **ఌ** and **ౡ** (and other related places) are not assigned consecutive
    code points. Is this a problem for sorting? ** *

    *a.* The answer is No. As sorting is supposed to happen according to Unicode
    collation charts. In this charts, as shown in my previous post
    <http://geek.chavakiran.com/archives/55>the order looks OK. But if there is
    some encoding that is going to replace Unicode in future, I guess we better
    have them in order. This might save sorting time. More over whatever is not
    placed in consecutive code points, is rarely used in Telugu (see my post on
    Telugu character usage http://archives.chavakiran.com/?p=254 ) so just for
    the sake of these rarely used characters we are wasting un-necessary CPU
    time I guess. (FYI – not in order code points ఋ && ౠ , ఌ && ౡ, ౘ, ౙ, ళ, ఱ) .
    So for all practical purposed probably we may simply do a binary data
    sorting and move on! Of these characters only ~La ( à°³) seems to be used with
    good frequency.

    *3. Mr. Chava, you said "Telugu digits are not taught in school", does that
    mean they are un-necessarily present in Unicode encoding? *

    Hmmm... Not exactly. Even though the Telugu dits are not taught in school
    during my days, I guess now they are being taught in recent years. More over
    there are attempts to make people aware of them example now Hyderabad city
    buses contain numbers in both Telugu digits and Indo-Arabic numerals. And
    most important point is religious and classical Telugu books printed very
    recently also uses these numbers. For images see my previous blog
    post<http://geek.chavakiran.com/archives/55>images. My only point is
    font developers should feel free to have Indo
    Arabic numerals for Telugu digits also.

    *4. Mr. Chava, you said Current Telugu Unicode encoding is flawed, do you
    detest Unicode encoding? *

    No. I Love it for all the scenarios it enabled for Telugu people on digital
    life. I love it, that is why I am spending time over it.

    *5. Avagraha symbol is this encoded? *

    *Yes. \u0c3D *

    *6. Does OM (AUM) symbol need a code point in Telugu? *

    My personal opinion : No. Telugu Om is always a combination of 'O' and ~M.
    Unless I am missing something. Even on temples, calendars devanagari OM is
    used in Telugu land and where ever Telugu Om is used that is a simple
    combination of 'O' and '~M'. There may be one or two special cases but that
    must be artistic freedom, may not require a code point. *
    *

    *Open issues: *

    *1. Telugu danda and double danda are to be encoded. *

    (I saw some discussions of this here and there, but none conclusive. A
    decision made?)

    *2. How to encode something for musical Telugu books (for example a dot
    above character, a dot below character, a horizontal line above character,
    a dot just before the character)*

    *3. How to encode a Telugu script Vedic book? (For example a vertical line
    over character, A horizontal line below character)*

    Ansser? Do we need to use the code points from the vedic block?
    http://www.unicode.org/charts/PDF/U1CD0.pdf

    *4. Guruvu , Laguvu are to be encoded with new code points? *

    (Suggested by Suresh Kolichala in Racchabanda mailing list)

    *6. Yati symbol is to be encoded with new code point. *

    (Suggested by Suresh Kolichala in Racchabanda mailing list)

    *7. Is there any way to encode Tala kaTTu? *

    *8. Is there any way to encode ka ottu? (à°•à±à°• , the second half of
    preceeding glyph). This is required to to encode a Telugu alphabets text
    book, where children were taught of ka ottu and then after few lessions they
    are taught about combining them with other vowels. The same question for all
    other ottulu. *

    *9. What are the pros and cons of new encoding scheme I proposed for Telugu
    script? (section 9 of my blog post <http://geek.chavakiran.com/archives/55>)
    Is this discussed somewhere?*

    *References*

    1.http://groups.yahoo.com/group/racchabanda/message/15576 --> Discussion on
    tzh character in Telugu.

    2. http://groups.yahoo.com/group/racchabanda/message/16367 RB mail after
    previous changes to Telugu Unicode.

    3.http://groups.yahoo.com/group/racchabanda/message/16378 A discussion on
    musical symbols in Telugu.

    4. http://unicode.org/alloc/nonapprovals.html Unapproval of arda visarga.

    5. http://unicode.org/~emuller/southasia/vedic/ Encoding of Vedic.

    ----
    నెనరà±à°²à±,
    కిరణౠకà±à°®à°¾à°°à± చావా
    http://te.chavakiran.com/blog
    http://en.chavakiran.com/blog
    2010/10/17 Frédéric Grosshans <frederic.grosshans@m4x.org>
    > Le samedi 16 octobre 2010 à 22:36 +0530, Kiran Kumar Chava a écrit :
    > > At the link, http://geek.chavakiran.com/archives/55 , I tried to
    > > understand Telugu Unicode encoding and then I tried to do an out of
    > > box review of this encoding. Kindly let me know if I am missing
    > > something, mentioned as missing in above article are really missing or
    > > not. Any other views...
    >
    > The 13 Telugu characters added in Unicode 5.1, including the fractions,
    > are enumerated here :
    > http://www.unicode.org/charts/PDF/Unicode-5.1/U51-0C00.pdf .
    >
    > The rationale for their inclusion are documented in
    > http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3116.pdf (which proposed 18
    > characters) . I have not looked close enough to check whether the 5
    > "missing" characters are linked to the one you consider as missing.
    >
    >        Frédéric
    >
    > --
    > Frédéric Grosshans
    > Chargé de Recherche
    > Laboratoire de Photonique Quantique et Moléculaire
    > ENS Cachan / CNRS UMR 8437
    > tel: (+33)1 47 40 77 15
    > GSM: (+33)6 09 24 29 64
    > e-mail: frederic.grosshans@ens-cachan.fr
    >
    >
    


    This archive was generated by hypermail 2.1.5 : Sat Oct 23 2010 - 07:53:24 CDT