Hello Stuart Somer,
you wrote:
> I find many recomendations not to use unicode characters for entities
> like em dashes trademark symbols because there is poor browser support.
According to HTML 4, <http://www.w3.org/TR/html401/charset.html#h-5.3>,
you may use any NCR (numeric character reference), or any entity, regard-
less of the encoding <http://www.w3.org/TR/html401/charset.html#h-5.2.2>.
In theory, the Document Character Set is always the Universal Character
Set (UCS, aka Unicode), <http://www.w3.org/TR/html401/charset.html#h-5.1>;
the encoding chosen is just the vehicle to transfer the characters
readily from the server to the client: the characters contained in that
set may be given in their respective binary representation, while any
character may be given as a NCR. A browser should be capable of dis-
playing all Unicode characters, provided there are suitable fonts
locally available.
In contrast to this theory, Netscape 4.7 does display only characters
that are in the encoding chosen -- with a notable exception: if the
encoding is ISO-8859-1, all CP 1252 characters can be displayed (at least
on Windows systems; I have not excessively tested Netscape on other OSes).
Cf. <http://czyborra.com/charsets/iso8859.html#ISO-8859-1>
and <http://czyborra.com/charsets/codepages.html#CP1252>,
for these character sets.
Netscape 6.2, Internet Explorer 6.0, and Opera 6.0 comply with
the HTML 4 character model, as outlined above.
Hence my recommendation:
- When your user community has Netscape 6.2, Internet Explorer 6.0, or
Opera 6.0, use any convenient encoding, and insert characters beyond
the chosen encoding as either NCRs or entities.
- When a notable fraction of your user community uses older browsers,
particularly Netscape 4.7:
- For characters contained in CP 1252, such as em-dash, trademark symbol,
and smart quotes, choose ISO-8859-1 encoding, and use NCRs for the
characters not in ISO-8859-1 (but in CP 1252).
- If you need characters beyond CP 1252, choose UTF-8 encoding; depending
on your editor (and other authoring tools), you may prefer to enter all
characters directly, or to enter the characters beyond ASCII as
entities
or NCRs.
In any case, it would be wise to
- stay within the WGL4.0 Character Set,
cf. <http://www.microsoft.com/typography/otspec/WGL4.htm>,
as there are suitable fonts freely available,
- test your WWW-pages with all browsers popular in your user community.
> Do you know of a chart for browser support of
> unicode by browser version.
The most comprehensive discussion I've seen is
<http://www.hclrss.demon.co.uk/unicode/browsers.html>.
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.2 : Tue Mar 19 2002 - 12:18:01 EST