Unicode on a non-Unicode web page

From: Gary P. Grosso (gpg@arbortext.com)
Date: Thu Sep 07 2000 - 10:39:49 EDT


Hi Unicoders,

I am working on software to emit HTML in the encoding
and character set of the user's choice, from SGML/XML
documents which can contain any Plane 1 Unicode character.
The question is what to do with characters outside the
selected encoding. I thought I would use the "numeric"
character entity reference and IE5 at least seems to
render that well, but Netscape Communicator 4.6 doesn't.

One way to look at this is: how do I use unicode as an
"escape" to include some isolated content on a web page
of arbitrary encoding?

For example, I have something such as:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Unicode in a Latin 2 page</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2">
</head>
<body style="line-height: 16pt"><div class="pgbrk" style="padding-top: 48pt">
<p>Článek Úvod Žádný čest čin činěn činů činům činnost činnosti
jakmile jako jakož jakožto jazyka jež jediné jednat jednotkou jednotlivec</p>
<p>CYRILLIC CAPITAL LETTER DJE: &#1026;</p>
<p>CAPITAL LETTER GAMMA: &#x0393;</p>
<p>HIRAGANA LETTER KA: &#12363;</p>
<p>jeho jejich jemu jimi jiného jinému jiných jiným jinými jsou každému každý
</p>
</body>
</html>

which probably looks awful since your email client is not likely
set to display Latin 2, but which can also be seen at:

http://www.angelfire.com/mi/virtualattic/latin2_test.html

If I change the meta tag to:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
then Netscape does slightly better (still stumbles over &#x-anything
and doesn't display the hiragana, but does display the DJE and GAMMA
if I use decimal values) but of course now the Czech words are not
displayed properly.

My question(s):

Is there some way I can nudge Netscape's browser to display these?

Is there a better way to write this admittedly mongrel HTML content?
I have heard somewhere that it is possible to change charset choice
"on the fly" and if would work, I would appreciate a pointer to
somewhere that says how best to do this.

Thanks in advance for any insights.

---
Gary Grosso
ggrosso@arbortext.com
Arbortext, Inc.
Ann Arbor, MI, USA



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT