Re: Browser support

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Mar 20 2002 - 07:21:59 EST


Hello all,

though Bill Kurmey wrote privately, I think this should be discussed
publicly; so I take the liberty to answer in Unicode list.

I had written:

> - When a notable fraction of your user community uses older browsers,
> particularly Netscape 4.7:
> - For characters contained in CP 1252, such as em-dash, trademark symbol,
> and smart quotes, choose ISO-8859-1 encoding, and use NCRs for the
> characters not in ISO-8859-1 (but in CP 1252).

Bill Kurmey wrote:

> Please, no. There are hundreds if not thousands of web pages already
> incorrectly identified as "ISO-8859-1" and which should be identified as
> "Windows-1252" when they contain NCRs not in the range of ISO-8859-1.

Wrong presupposition.

As I had explained, the document charset is fixed: ISO 8859-1 for HTML 2
and HTML 3; UCS for HTML 4. This means
- that a HTML 4 source may legally contain NCRs from the whole UCS/Unicode
   range,
- the HTTP Content-Type/charset parameter only determines how the bytes
   transmitted to the browser are to be transformed back into characters
   (which, in due course, will be parsed, according to the HTML syntax,
   into tags, entities, NCRs, or text elements).
Cf. <http://www.w3.org/TR/html401/charset.html>, for the gory details.

In the particular example,
- an HTML 4 page labelled
   <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
   can contain an em-dash in any of four representations:
   · Bytes 26 6D 64 61 73 68 3B : "&mdash;"
   · Bytes 26 23 38 32 31 32 3B : "&#8212;"
   · Bytes 26 23 78 32 30 31 34 3B : "&#x2014;"
   · Bytes 26 23 58 32 30 31 34 3B : "&#X2014;"
- an HTML 4 page labelled
   <meta http-equiv="Content-Type" content="text/html; charset=CP-1252">
   can contain an em-dash in any of five representations:
   · Bytes 26 6D 64 61 73 68 3B : "&mdash;"
   · Bytes 26 23 38 32 31 32 3B : "&#8212;"
   · Bytes 26 23 78 32 30 31 34 3B : "&#x2014;"
   · Bytes 26 23 58 32 30 31 34 3B : "&#X2014;"
   · Byte 97 : em-dash, encoded in CP 1252

I should have mentioned, that not all of the older browsers handle
hexadecadic NCRs (though Netscape 4.77 does). Hence, I recommend to
use only the decimal ones, for another couple of years.

Bill Kurmey wrote:

> Many do not declare the version of HTML in which they were created.

The Doctype declaration is essential. I should have mentioned this, in
my previous note. HTML 4 requires a Doctype declaration, cf.
<http://www.w3.org/TR/html401/struct/global.html#h-7.2>; a HTML source not
containing a Doctype declaration is always assumed to be in HTML 2.0. Hence,
the discussion above depends on a valid HTML 4 Doctype declaration.

> In the versions of Netscape 4.7x which I have tested, none handle ALL

> of the c1 control range used in CP1252 including some of the characters

> you have specified. Netscape 4.75, for example, does correctly handle

> the smart quotes and em-dash, but the trademark symbol (and others in

> CP1252) appear as white "boxes."

A white box means that Netscape is unable to locate a suitable font to dis-
play the respective character. You can only display characters available
locally; I think I had mentioned this. You can download MS core fonts, in
Truetype format, from <http://www.eu.microsoft.com/typography/fontpack/>.
Those labelled "WGL4" contain all required characters (and more).

As it happens, all of the 27 character CP 1252 has in excess of ISO 8859-1
are mentioned in <http://www.systems.uni-konstanz.de/EMAIL/FAQ.php#HTTP-71>;
this page uses decimal NCRS, such as "&#8212;".
- Netscape Communicator 4.77 under Windows 98 Vers 4.10.222 [de]
   displays 25 of them, the exceptions being the two characters
   Z with Hachek (which are, surprisingly, replaced with white boxes).
- Netscape Communicator 4.6 under Mac OS D1 - 8.6, displays 23 of them,
   the exceptions being the four characters S and Z with Hatchek.
- Netscape 4.77 [en] under Solaris 8
   · displays 14 of these,
   · displays fall-back representations for another 8 of them,
     viz. "OE", "oe", "S", "s", "Y", "EUR", "f", and "[TM]",
   · displays question marks for 5 of them
     viz. the Daggers, the Zs with Hatchek, and the Promille Sign.

So, yes, not all of the CP 1252 additions are correctly displayed by
Netscape 4.7; and I apologize for my sloppyness. But, no, Netscape 4.7
has no problem with the Trademark Symbol, nor any other of the symbols
mentioned in the original question.

Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Wed Mar 20 2002 - 08:42:27 EST