Re: UTF-8 code in HTML

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Fri Apr 14 2000 - 08:46:30 EDT


I basically agree with you. Just a minor correction.

addison@globalsight.com wrote:
>
> Second, Markus is "right" about HTML being "self-describing", but this overlooks
> a few things. If your page contains only the characters in the Latin-1 character
> set, then using UTF-8 will work correctly with about 97% of the browsers as they are
> configured at install time. So UTF-8 is a fine choice for serving pages that could
> be encoded in 8859-1.

But the page get slightly bigger (a few percent, though).

> The fact that the browser can correctly decode the UTF-8 is not at issue. The
> problem is that IE4 and NN4 allow only one font to be associated with the UTF-8
> encoding (in the user interface)... and the default is a Latin-1 font.

Well, this is perhaps the case for Macintosh and Unices, but I do not agree
with you about Windows boxes.

The default configuration defers to Times New Roman and Courier New, and both
fonts exist in two "forms", a small (default with Win9X) that only supports
Latin-1, and a bigger (NT default? forced installed with the Euro upgrade on 9X)
that does support Latin-2, Cyrillic and quite some more things.

> My speculated Polish-Japanese page contains a few "black squares" in the
> Polish and all black squares in the Japanese.

This is agreed about Japanese (and I do not know any solution short of
Cyberbit or Arial Unicode).

However, I do not agree about Polish (or Russian): I expect any reader of such
a page to have installed the "multilingual setup", which means that s/he have
the big versions of the fonts installed. And then, s/he can read Polish
or Russian character without any problem.

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT