Re: UTF-8 code in HTML

From: Addison Phillips [GSC] (
Date: Fri Apr 14 2000 - 12:22:34 EDT

Hi Antoine,

> > I wrote:
> > The fact that the browser can correctly decode the UTF-8 is not at
issue. The
> > problem is that IE4 and NN4 allow only one font to be associated with
the UTF-8
> > encoding (in the user interface)... and the default is a Latin-1 font.
> Antoine wrote:
> Well, this is perhaps the case for Macintosh and Unices, but I do not
> with you about Windows boxes.
> The default configuration defers to Times New Roman and Courier New, and
> fonts exist in two "forms", a small (default with Win9X) that only
> Latin-1, and a bigger (NT default? forced installed with the Euro upgrade
on 9X)
> that does support Latin-2, Cyrillic and quite some more things.

I did oversimplify the case for Windows, as you point out, but these
additional copies of TNR and Courier do not get you complete support for the
characters encoded by UTF8. And it depends which Windows you mean as to
which fonts are installed by default.

> > My speculated Polish-Japanese page contains a few "black squares" in the
> > Polish and all black squares in the Japanese.
> This is agreed about Japanese (and I do not know any solution short of
> Cyberbit or Arial Unicode).
> However, I do not agree about Polish (or Russian): I expect any reader of
> a page to have installed the "multilingual setup", which means that s/he
> the big versions of the fonts installed. And then, s/he can read Polish
> or Russian character without any problem.

The "multilingual setup" is not installed by default and most users don't
know about it, I suspect. I often, for example, view Polish or Russian pages
using a wide variety of machines, and most default IE4 and NN4 setups cannot
display these pages correctly without "little black squares"... and, unless
you're running on a machine in the language in question, it's usually a
several step process to install such support.

This is fine for me: it's what I do, after all. But when advising people
about what encoding to use to serve their pages, especially Asian language
pages, I still recommend delivering a native encoding because of limited
browser support for Unicode. As I said before, I think this is changing and
I expect that in the next year or thereabouts I will be able to recommend
otherwise (I'm already designing systems that serve UTF-8 to certain
browsers and native encodings to others). But it is not my experience that a
high percentage of machines work correctly with UTF-8 text outside the
Latin-1 encoding, as installed (not to say that you can't make it work).



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT