Re: Special characters

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Nov 06 2002 - 06:18:14 EST

Next message: Marco Cimarosti: "RE: Names for UTF-8 with and without BOM - pragmatic"

Previous message: William Overington: "Re: ct, fj and blackletter ligatures"
In reply to: Edward H Trager: "Re: Special characters"
Next in thread: David Starner: "Re: Special characters"
Reply: David Starner: "Re: Special characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello,

I had written:

> HTML:
> · Store your entire page in UTF-8, [...]
> · Store your entire page in a suitable standard codepage, cf.
> <http://czyborra.com/charsets/iso8859.html>, [...]
> · Store your page in some standard CP (as above), and enter the
> particular problem characters as NCRs, [...]

Edward H Trager wrote:
> Even though they are second and third options in your email response,
> are you sure you want to implicitly encourage someone to use CODEPAGES
> instead of UTF-8 on their web pages? This is not good advice, I fear.

I was explicetely referring to "standard codepages", and I included
a link to a description of the ISO 8859-1 series. I did not mean to
advocate throwing HTML in proprietary encodings at poor, unsuspecting
browsers...

Of course, UTF-8 is the way to go for newly designed, international web
pages. However, there may be situations where you are forced to particular
encodings, so I thought I should mention the possibility.

> One of the biggest headaches I have is trying to read web pages
written in
> certain code pages that don't appear correctly under various browsers on
> my non-Windows workstations (maybe it's a problem on Windows too, I just
> haven't checked): if those pages had been in UTF-8, then very likely they
> would at least be readable.

It would be interesting to know more particulars:

- Are you sure that the pages causing your hedache were properly tagged
with the charset?

   I have seen many HTML pages (and e-mail, btw.) encoded in MS CP 1252
   (cf. <http://czyborra.com/charsets/codepages.html#CP1252>) but tagged
   as ISO-8859-1, or even as ASCII; cf. an example in my e-mail FAQ at
   <http://www.systems.uni-konstanz.de/EMAIL/FAQ.php#SMTP-71>.

- Which CP cannot be properly handled by which browser/OS combo?
Have you seen anything beyond the findings of Alan Wood, cf.
<http://www.alanwood.net/unicode/browsers.html>?

   I guess that the ISO 8859 series' encodings will be handled by
   any browser on any system (if correctly configured and supplied
   with suitable fonts) -- but I never had the time and resources
   to test this conjecture.

A popular browser, Netscape Navigator, version 3 through 4.8, does
not handle NCRs according to the HTML 4 specification. Alan Wood de-
scribes this behaviour thus:
: Numeric character references [...] are supposed be displayed
: independently of the document's character encoding, but Naviga-
: tor 4.8 is restricted to the numeric character references that
: fall within the current encoding (either specified in a meta tag
: or selected from the View menu). It is normally necessary to select
: the Unicode (UTF-8) character set from the View menu in order to
: force numeric character references to be displayed properly.

The HTML author can easily circumvent this problem via a variant
of my 3rd alternative, viz.
· Store your page in ASCII (i. e. 7-bit only!), and enter every
non-ASCII character as a NCR; but tag your page as UTF-8.

This works well, as ASCII is a proper subset of UTF-8. This scheme
is feasable for content that is largely in Latin script, with oc-
casional national/special characters. It does not require any ad-
vanced software on the author's side: a simple 8-bit editor, such
as Notepad from Windows 95, will suffice. NN 4.7/4.8 will happily
display the characters produced via the NCRs; so will all browsers
capable of displaying UTF-8-encoded HTML 4.

Best wishes,
Otto Stolz

Next message: Marco Cimarosti: "RE: Names for UTF-8 with and without BOM - pragmatic"
Previous message: William Overington: "Re: ct, fj and blackletter ligatures"
In reply to: Edward H Trager: "Re: Special characters"
Next in thread: David Starner: "Re: Special characters"
Reply: David Starner: "Re: Special characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 06 2002 - 06:56:59 EST