From: Ed Trager (ed.trager@gmail.com)
Date: Tue Jun 15 2010 - 13:13:58 CDT
Hi Unicoders,
Suppose that we write Unicode text in a web page that we create. We
are worried that our viewers' computers lack a font for proper display
of the script in which our text is written. Obviously it will not be
good if our users only see square boxes or question marks instead of
the text that we want them to be able to see and read:
□□□□□□□□□ ... <= Bad! :-(
We want a solution to this problem.
Until very recently, apparently the best we could do was to warn the
user of the possibility of unrenderable text. For example Wikipedia,
on pages related to Indic languages, says:
“This article contains Indic text. Without proper rendering support,
you may see question marks or boxes, misplaced vowels or missing
conjuncts instead of Indic text.”
But now that “good” browsers support @font-face, we can envision a
better solution: If the browser does not have a font for rendering a
specific script, we can dynamically supply one.
I have written some simple Javascript to detect whether a user's web
browser can display Unicode text in a specific ISO 15924 script.
Here's how it works, using Javascript:
* Create two divs on the page but set the CSS opacity to zero so
the user doesn't see them.
* In one div, place a relatively narrow letter from the target
script. For example, for Latin one might choose "i".
* In the other div, place a relatively wider letter from the target
script. For Latin, "w" is an obvious choice.
* If the width of the two divs is identical, then the letters were
rendered as square boxes or question marks.
* Otherwise, if the widths differ, then the browser has found a
system font capable of rendering the text.
In the case of a negative result where the widths are the same, we can
then dynamically add an @font-face rule to the page to download an
appropriate font. I have an experimental web application that already
does exactly this to support Tai Tham (Lanna) script. As Lanna is a
fairly recent addition to Unicode, only a very few people will have a
Lanna font available on their machines.
Astute unicoders on this list will probably already have recognized
one or more shortcomings of this method. This method works perfectly
for most scripts, but of course it fails for monospaced scripts like
Chinese, Japanese, Korean, Yi, and possibly some others like Phags Pa.
For monospaced scripts, I tried doing this:
* In the first div put U+FFFE. Every browser I tested rendered
U+FFFE as a square box.
* In the second div put a representative character from the
script, such as "中" or "文" for Chinese.
In theory, the U+FFFE will always be rendered as a box with a fixed
width, and one would expect that there is a fairly good probability
that the fixed width of any Chinese font on the machine will not be
exactly the same as the width of the fallback square box.
But in practice, based on my tests, this does not work. One problem
is that Firefox's fallback square boxes contain the Unicode code point
hex digits -- and these fallback square boxes can actually be of
different widths depending on the hex codes contained therein. Also
it might just happen that the fixed width of the Chinese glyph is
exactly the same width as that of the fallback box used to render the
U+FFFE.
It would be very nice to come up with a reliable solution for scripts
that are traditionally monospaced. Does anyone have any brilliant
ideas?
- Ed Trager
This archive was generated by hypermail 2.1.5 : Tue Jun 15 2010 - 13:18:05 CDT