Devanagari Display Issues (was Re: Arial Unicode MS and Code2000)

From: James Kass (jameskass@worldnet.att.net)
Date: Mon Jul 09 2001 - 11:51:10 EDT


Rajesh Chandrakar wrote:

>> >
>> > Here is the line in question reproduced in Unicode (UTF-8):
>> >
>> > वेंकटाचलपते (निन्‍नु नम्‍मिति वेगमे नन्‍नु)
>>
>> ...and here it is without the special character which is supposed
>> to force selection of the half-letter form instead of the
>> conjunct. On this system, the matras appear as expected:
>>
>> वेंकटाचलपते (निन्नु नम्मिति वेगमे नन्नु)
>
> let me explain first, what I am using to display the multilingual
> in my browser.
>
> my operating system is Windows 95, Browser - Netscape
> Communicator 4.7 and Arial Unicode MS font with UTF-8.
> I am not much confident about the sequence is being used
> by the Arial Unicode MS, Code2000, TITUS and other
> Aksharmala to represent the glyphs and surrogates.
> Because when I wanted to display the characters, which
> has been written to used Aksharmala, TITUS and Code2000,
> somewhere I got characters mess with my browser. Is that
> due to private use area, which has been used different by
> each font vendors.

Microsoft offers a free download program
http://www.microsoft.com/typography/property/property.htm
which shows TrueType/OpenType extended properties. It works
on Win 9x and up.

With this utility installed, it's possible to right-click on a font
icon and view important information specific to the font. One
of the "tabs" shows whether a font has OpenType tables, and
if so lists the tables by script and supported feature.

In order to properly display correcly encoded Unicode Indic
text on Windows platforms, the latest version of Microsoft's
Uniscribe ("USP10.DLL") must be installed on the system. The
font(s) used must have OpenType tables and must cover the
target script(s). The applications must generally be the newest
possible versions.

In order to display Indic scripts with text which isn't properly
encoded, like Private Use Area enhancements or custom font
encodings, the font choice would be limited and the author
might have to "work-around" certain automatic processing
which would occur on some platforms.

Both lines of Devanagari text above are properly encoded to the
best of my ability and don't use any PUA positions.

In the Outlook Express e-mail program, the first Devanagari UTF-8
string above looks just like the top half of the picture at
http://home.att.net/~jameskass/dev003.gif
Matra re-ordering is being done by the system except in one
place, same as in Internet Explorer. Half-form substitutions
aren't happening, probable problem is the font here. In the
second string, all of the matra re-ordering is done right on
this system and some conjunct ligature substitutions are happening.
The only difference between the two lines of text above is that the
first line uses the Zero Width Joiner in an effort to force
substitution of half-forms instead of conjuncts. The default
appearance of the above lines of text (that is, without any
OpenType substitutions) would be the consonant plus virama forms.
As far as I know, the system does not re-order matras unless the
font in use has OpenType tables.

So, if different font developers use different Private Use Area
schemes, and the font is OpenType, and the text is correctly
encoded, and the operating system supports Unicode/OpenType,
and the application is Unicode-aware... the different PUA scheme
in the font won't make any difference in the display.

As far as browsers go, if the browser doesn't work there is
perhaps a newer version of the browser available. If the newer
version of the browser still doesn't work, there are other browsers.

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Mon Jul 09 2001 - 10:18:12 EDT