From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Nov 06 2002 - 06:18:14 EST
Hello,
I had written:
> HTML:
> · Store your entire page in UTF-8, [...]
> · Store your entire page in a suitable standard codepage, cf.
> <http://czyborra.com/charsets/iso8859.html>, [...]
> · Store your page in some standard CP (as above), and enter the
> particular problem characters as NCRs, [...]
Edward H Trager wrote:
> Even though they are second and third options in your email response,
> are you sure you want to implicitly encourage someone to use CODEPAGES
> instead of UTF-8 on their web pages? This is not good advice, I fear.
I was explicetely referring to "standard codepages", and I included
a link to a description of the ISO 8859-1 series. I did not mean to
advocate throwing HTML in proprietary encodings at poor, unsuspecting
browsers...
Of course, UTF-8 is the way to go for newly designed, international web
pages. However, there may be situations where you are forced to particular
encodings, so I thought I should mention the possibility.
> One of the biggest headaches I have is trying to read web pages
written in
> certain code pages that don't appear correctly under various browsers on
> my non-Windows workstations (maybe it's a problem on Windows too, I just
> haven't checked): if those pages had been in UTF-8, then very likely they
> would at least be readable.
It would be interesting to know more particulars:
- Are you sure that the pages causing your hedache were properly tagged
with the charset?
I have seen many HTML pages (and e-mail, btw.) encoded in MS CP 1252
(cf. <http://czyborra.com/charsets/codepages.html#CP1252>) but tagged
as ISO-8859-1, or even as ASCII; cf. an example in my e-mail FAQ at
<http://www.systems.uni-konstanz.de/EMAIL/FAQ.php#SMTP-71>.
- Which CP cannot be properly handled by which browser/OS combo?
Have you seen anything beyond the findings of Alan Wood, cf.
<http://www.alanwood.net/unicode/browsers.html>?
I guess that the ISO 8859 series' encodings will be handled by
any browser on any system (if correctly configured and supplied
with suitable fonts) -- but I never had the time and resources
to test this conjecture.
A popular browser, Netscape Navigator, version 3 through 4.8, does
not handle NCRs according to the HTML 4 specification. Alan Wood de-
scribes this behaviour thus:
: Numeric character references [...] are supposed be displayed
: independently of the document's character encoding, but Naviga-
: tor 4.8 is restricted to the numeric character references that
: fall within the current encoding (either specified in a meta tag
: or selected from the View menu). It is normally necessary to select
: the Unicode (UTF-8) character set from the View menu in order to
: force numeric character references to be displayed properly.
The HTML author can easily circumvent this problem via a variant
of my 3rd alternative, viz.
· Store your page in ASCII (i. e. 7-bit only!), and enter every
non-ASCII character as a NCR; but tag your page as UTF-8.
This works well, as ASCII is a proper subset of UTF-8. This scheme
is feasable for content that is largely in Latin script, with oc-
casional national/special characters. It does not require any ad-
vanced software on the author's side: a simple 8-bit editor, such
as Notepad from Windows 95, will suffice. NN 4.7/4.8 will happily
display the characters produced via the NCRs; so will all browsers
capable of displaying UTF-8-encoded HTML 4.
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Wed Nov 06 2002 - 06:56:59 EST