Re: Transcoding Tamil in the presence of markup

From: Peter Jacobi (peter_jacobi@gmx.net)
Date: Sun Dec 07 2003 - 15:23:37 EST

  • Next message: Peter Jacobi: "Re: Fwd: Re: Transcoding Tamil in the presence of markup"

    Hi John,

    Thank you for doing the tests:

    > I have uploaded a valid page to
    >
    > <http://bd8.com/temp/tamil_unicode_tscii.html>

    But I assume you are starting from the
    wrong assumptions. You wrote:

    > In your TSCII version you write
    > &#xa7;<span>&#xc4;</span>&#xa1;
    >
    > is that not equivalent to Unicode
    >
    > &#xbc6;<span>&#xbb2;</span>&#xbbe;

    But TSCII
    &#xa7;&#xc4;&#xa1;
    is Unicode
    &#xbb2;&#xbca;
    or, altenatively but deprecated:
    &#xbb2;&#xbc6;&#xbbe;

    This change in codepoint order (from 'visual'
    to 'logical') is the root of transcoding difficulties.

    Depending on how your font and renderer handle
    isolated vowel signs, you may be able to fake
    something along the lines of:
    &nbsp;&#xbc6;&#xbb2;&#xbbe;

    But this is an abuse of the Unicode encoding model for
    Indic.

    > For Windows browsers I find I have to specify a Unicode font (in this
    > case Arial Unicode MS) in order for pages to display properly without
    > the user fiddling with his browser preferences.

    That may give you the better display. Not being an owner of MS Office
    I cannot legally test Arial Unicode MS. All Unicode fonts I tested
    (Latha, Code 2000 and Avarangal) didn't give a good display
    with your solution, as the vowel sign out of its true position will
    render an additional 'mark' (dotted circle).

    Regards,
    Peter Jacobi

    -- 
    +++ GMX - die erste Adresse für Mail, Message, More +++
    Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net
    


    This archive was generated by hypermail 2.1.5 : Sun Dec 07 2003 - 16:06:18 EST