From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Wed Jun 03 2009 - 14:19:17 CDT
Damon Anderson wrote:
> I understand that the major barrier to
> display of Unicode lies in fonts, but how does this relate to
> platforms?
Different operating systems and applications have different fonts shipped
with them, and most people (excluding people on this list, of course) don’t
acquire additional fonts.
But fonts aren’t the only problem.
> I have a document composed in OpenOffice on Windows using
> the Verdana font and the Unikey unicode keyboard driver for
> Vietnamese.
I don’t know the Unikey driver, but its documentation suggests that it can
produce characters both as precomposed and as decomposed. This might explain
something.
For example, the Vietnamese letter ệ can be represented in Unicode in
several ways: as a precomposed single character or as a sequence of two or
three characters, building the character up from constituents (the base
letter “e” and two diacritic marks). These representations are distinct as
Unicode data, though they represent the same character (when using a more
abstract concept of character). Their renderings are in theory supposed to
be identical but in practice they are very often differences.
> I open the same document in my OpenOffice on Linux
> (Kubuntu) after installing the Verdana font and many of diacritical
> marks are now on the wrong letters or shifted to the right one
> character space.
One computer’s Verdana need not be identical with another computer’s
Verdana. But the symptoms suggest that the problem might be in the
representation variation.
For example, suppose that your writing tool produces the letter ệ internally
as a three-character sequence: the base letter and two combining diacritics.
Some program might then internally convert that into a precomposed single
character and render it using the selected font. Another program might print
the letter “e”, then position the circumflex and the dot below (as taken
from the font) somehow, and they might do a good job here but also a lousy
job. The most primitive algorithms just overprint the base letter using a
glyph for the diacritic in a fixed position. (This means, of course, that if
ê produced that way looks tolerable, Ê won’t, and vice versa.)
> If even the font displays aren't consistent across
> platforms where in lies the problem and how can I distribute
> consistently displayed documents?
Perhaps you can use embedded fonts. The possibilities and methods for this
depend on the software and on fonts (not all fonts allow embedding).
But for Vietnamese, I would expect that reasonably good consistence can be
achieved if you use a commonly available (though perhaps typographically
suboptimal) font like Verdana, Arial, or Times New Roman and make sure that
your Unicode data uses precomposed characters.
As a rough estimate, you might use the font list at
http://www.fileformat.info/info/unicode/char/1ec7/fontsupport.htm
(the character U+1EC7 is of course just a more or less arbitrary pick, but I
would expect a font to support all of Vietnamese if it supports this
particular character, which is used in Vietnamese only). Then you would need
to find out which of the suitable fonts are by default available on
platforms that matter in your case.
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Wed Jun 03 2009 - 14:24:12 CDT