Re: Unicode on a non-Unicode web page

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Thu Sep 07 2000 - 11:07:43 EDT


NS 4.x is simply not very good at this sort of thing. The only real solution
is to use an encoding that will support the characters, such as UTF-8.

michka

----- Original Message -----
From: "Gary P. Grosso" <gpg@arbortext.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Thursday, September 07, 2000 7:32 AM
Subject: Unicode on a non-Unicode web page

> Hi Unicoders,
>
> I am working on software to emit HTML in the encoding
> and character set of the user's choice, from SGML/XML
> documents which can contain any Plane 1 Unicode character.
> The question is what to do with characters outside the
> selected encoding. I thought I would use the "numeric"
> character entity reference and IE5 at least seems to
> render that well, but Netscape Communicator 4.6 doesn't.
>
> One way to look at this is: how do I use unicode as an
> "escape" to include some isolated content on a web page
> of arbitrary encoding?
>
> For example, I have something such as:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <html><head><title>Unicode in a Latin 2 page</title>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2">
> </head>
> <body style="line-height: 16pt"><div class="pgbrk" style="padding-top:
48pt">
> <p>Článek Úvod Žádný čest čin činěn činů činům činnost činnosti
> jakmile jako jakož jakožto jazyka jež jediné jednat jednotkou
jednotlivec</p>
> <p>CYRILLIC CAPITAL LETTER DJE: &#1026;</p>
> <p>CAPITAL LETTER GAMMA: &#x0393;</p>
> <p>HIRAGANA LETTER KA: &#12363;</p>
> <p>jeho jejich jemu jimi jiného jinému jiných jiným jinými jsou každému
každý
> </p>
> </body>
> </html>
>
> which probably looks awful since your email client is not likely
> set to display Latin 2, but which can also be seen at:
>
> http://www.angelfire.com/mi/virtualattic/latin2_test.html
>
> If I change the meta tag to:
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
> then Netscape does slightly better (still stumbles over &#x-anything
> and doesn't display the hiragana, but does display the DJE and GAMMA
> if I use decimal values) but of course now the Czech words are not
> displayed properly.
>
> My question(s):
>
> Is there some way I can nudge Netscape's browser to display these?
>
> Is there a better way to write this admittedly mongrel HTML content?
> I have heard somewhere that it is possible to change charset choice
> "on the fly" and if would work, I would appreciate a pointer to
> somewhere that says how best to do this.
>
> Thanks in advance for any insights.
>
>
> ---
> Gary Grosso
> ggrosso@arbortext.com
> Arbortext, Inc.
> Ann Arbor, MI, USA
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT