Re: UTF-8 code in HTML

From: Antoine Leca (
Date: Mon Apr 17 2000 - 05:04:54 EDT

Glen Perkins wrote:
> I wonder how big a problem a typical large corporation would actually face
> if they switched from the current "legacy encodings" in each world market to
> UTF-8.

In my eyes, the biggest problem is that the production tools, for the moment,
are sending legacy encodings instead of UTF-8 by default.

This applies to HTML authoring tools as well as to any other tool that are
able to save documents usable by anyone else. Office 2000 do a nice try
with the HTML/XML format that do use utf-8 (IIRC), but for the moment it
looks like a oasis in the desert.

> Then, what percentage of the French market would have trouble with UTF-8 vs
> Latin-1? You have similar CP1252 problems, plus the Euro issue. What
> percentage of browsers would have problems with a well-built UTF-8 page *in
> French*, given the actual installed base of browsers in France today?

I am speaking for my intranet:
We actually have a fair proportion of NN3 and IE3 browsers still used
(and won't change for next generation, because of the scarse memory...)
But I am believing that by mid-2001 this would be a problem-from-the-past.

The main problem in fact, is probably that most servers are Domino 4.x
servers (which are not precisely aimed toward utf-8, as I see things;
please don't flame me, I never look inside to see if they can be set
up differently than they are...)

As you said, Euro is a bigger problem (to print it, in particular).

Now for the public page (, I believe the answer is different:
We should try to accept the broadest range of browsers, while at the
same time the site should look attractive! So the designers will probably
stick with what they know about, namely Latin-1, for quite some time.

> I believe that with a Polish OS and a reasonably
> recent browser in default configuration, UTF-8 would work fine.

I agree.

In fact, once you are going outside the Latin-1 + CJK world, a multinational
is probably right in moving toward UTF-8 rather than "playing" with
the various legacy encodings.

Best regards,

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT