Re: help regarding UNICODE

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Sep 14 2000 - 08:09:15 EDT


Paresh Agarwal wrote:
>
> I am enthusiastic to know about Unicode font system in depth,
> specifically with regard to Indian languages.
> Would anyone suggest how to go about all this?

AFAIK, there is no such thing such as "Unicode font system".
There is Unicode, which is merely monolithic, on one side.
There are many font systems, on the other. Some of these complies (partialy)
with the Unicode conformance requirements. In the peculiar field of Indian
languages, none of them complies entirely, at least to my knowledge.
However, we see a clear move toward increasing conformance. A clear sign
was given by a company named Microsoft, with a "product" called Uniscribe
along with fonts shipped with another bigger product named Windows 2000,
that attempts at this goal. There are other projects, more or less advanced.

> What is the difference between Unicode fonts and other fonts?
> Are there separate Unicode fonts?

This question does not have easy answers, since Unicode clearly states
that rendering --hence the fonts-- is a completely distinct process from
encoding (which is at what Unicode aims).

Furthermore, fonts are pretty various things. I obviously discard lead
fonts and all the traditional typographic things, since Unicode is
completely foreign to them. If we stick with the computer archives of
glyphs, a distinction could be whether they are encoded directly with
Unicode or not; here, the answer depends on the format: with TrueType
and its derivatives, the answer is most often yes; with others formats,
particularly the older ones, it is most often no.
Another point is the correct rendering (according to Unicode conformance
requirements) using a given fonts; since you asked in the specific
field of Indian languages, here the answer is that additionnal informations
are needed in the font to conformaly render Unicode. Very few fonts
have these informations. Probably Tamil have the best available options
here; while all languages when written in Nastaliq Arabic, Oriya
and Malayalam are the least advanced. Furthermore, since the definition
of "Indian" is pretty vague, "surrounding" languages such as Divehi written
in Thaana or Sinhalese or Burmese, when written in their own scripts,
exhibits also bad situations.

> Is it possible to convert other fonts to Unicode?

To make conforming an existing font? I believe the answer is always yes,
provided that the font does provide an adequate range (that is, not too
limited) of glyphs to fulfill the requirements (for example, this
disqualify current versions of Arial MS Unicode for Indian scripts, since
it only provides the basic shapes and not all glyphs needed).
However, the process of conversion is certainly not easy, and often can
be thought of almost impossible: certains font formats are not adequate,
particularly the bitmapped ones, so sometimes a change of the font format
is to be done; often, a number of specific operations cannot be carried on
with the font format, so specific tools (pre-processors like with TeX,
or even rewrite / enhancements of the whole rendering graphic system such as
with X11), are to be done.
So while possible, this is perhaps not doable.

OTOH, others cases (I am thinking in particular about Tamil and the
languages like Urdu written in Arabic, when using Naskh style fonts) may
be pretty easy to do. Providing of course that the surrounding rendering
system conforms to Unicode, which is as I said more the exception rather
than the rule.

> Is keyboard arrangement in Unicode system different form that of the regular
> ttf fonts??

Keyboard arrangement is completely distinct from Unicode.
And is certainly *not* related to the font.

First, I do not see "regular ttf fonts" (I believe you mean the TrueType
fonts that are widely used to render Indian scripts on standard Windows or
Macintosh systems, don't you?) as having *one* keyboard arrangement.
I merely see almost one different keyboard layout for each font face!
(Tamil being a notable exception here, thanks to efforts from the TamilNet
community).

Next, keyboard driver can be made Unicode compliant, by using
the (some?) Unicode format to encode the datas that travel from the keyboard
to the attached box. But this is not a requirement for a system able to
render Unicode (I routinely enter Indian datas on my Windows 9X keyboard
where the keyboard driver does not use any Unicode transformation format).

Then, if the keyboard driver encodes in Unicode (as it is the case with
Windows NT/2000), then the link between the key labels and the actual codes
is not required to be strong. Of course, for reasons of efficiency and easiness,
Microsoft engineers chose to propose a standard keyboard layout that is
very similar to the Inscript (phonetic) layout, hence very different to those
of the Indian typewriters, and also different to the ones used with in liaison
with the TrueType fonts that are widely used to render Indian scripts.
But there are other solutions, and for example a company named Tavelsoft
will shortly ships a replacement driver, that allows among others use of
a visual-based keyboard layout, along with transmission of Unicode codepoints
to the rest of the system.

So the answer here is a clear "No, they are not different."

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT