RE: UTF-8 in web pages

From: Chris Pratley (chrispr@microsoft.com)
Date: Mon Feb 08 1999 - 02:38:04 EST


>>And most do not come configured for UTF-8 out of the box, which is the
real
>>show-stopper now for more widespread use of that charset.

Actually, if you measure by number of users, I think UTF-8 capable browsers
are easily over 50% now. UTF-8 works well in Internet Explorer 4.01 and
higher, and from what I can tell it seems to function in Navigator 4.03 and
higher. Together, current research shows that those two (and later versions
of them) plus other browsers based on IE technologies (and Tango) account
for somewhere around 75% of the installed user base. So, we're making
progress. (Note that things work much better in these browsers if you label
the file as UTF-8 using the META tag).

That said, you can maintain reasonable backward compatibility by using
legacy encodings (e.g. Shift-JIS) that support the majority of your page's
content, plus Unicode NCRs for the remainder. That way your content is as
visible as it can be, and the same browsers that support UTF-8 (roughly)
support Unicode NCRs. The only drawback is that the size of your file may
slightly increase. UTF-8 cleaner and easier though, so the sooner we can all
move to that, the better.

BTW Chris Wendt can comment, but I believe that IE also uses the lang
attribute to pick a suitable font, if one is available.

-----Original Message-----
From: Francois Yergeau [mailto:yergeau@alis.com]
Sent: Sunday, February 07, 1999 10:00 AM
To: Unicode List
Subject: RE: UTF-8 in web pages

À 21:18 06/02/99 -0800, Adrian Havill a écrit :
>but are
>there any browsers out there that plan to implement this (choosing a font
>appropriate for a particular language based on the lang attribute)?

Tango has been doing for a couple of years. It has a notion of a
preferential font, that is dynamically influenced by both the charset of
the page and the lang attribute. Characters are looked up first in that
font, then in all the others in order of their declaration in a
configuration file.

But the absence of such a feature is no reason not to use UTF-8 vs, say,
Shift_JIS. If you pick Shift_JIS, you're restricted to Japanese
characters, which will be displayed correctly. If you choose UTF-8, you
can still have all Japanese chracaters display correctly, but you can also
get other characters to display if you have a larger font or if your
browser knows how to use multiple fonts (like Tango).

>Currently, the "popular" browsers out there associate fonts with character
>encodings, not languages.

And most do not come configured for UTF-8 out of the box, which is the real
show-stopper now for more widespread use of that charset.

--
François Yergeau



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT