Am 1999-10-22 um 1:36 h hat Denice Szafran Liscomb <okana@okanasweb.net>
geschrieben:
> how to code [characters from the Latin Extended A range] on an HTML or XML
> page to make the characters appear properly.
I can only give advice for HTML. I have sent most of this to the Unicode
List, back in July.
> Where do I find this information?
Cf. <http://www.w3.org/TR/REC-html40/charset.html>, in particular
<http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2>: according
to the HTML 4.0 definition, you may choose the most convenient code
page (which is perceived as a transport vehicle only) for your HTML
page and use entities, such as "€", "€", or "&x20AC",
for those characters that are not in the code page chosen for the
transfer. In practice, however, this does work well *only* when you
choose UTF-8 as your transfer encoding; then you won't need to resort
to numerical character references, of course (but you are free to use
them if they convene to you). Examples of UTF-8 based pages:
<http://www.reuters.com/unicode/iuc10/x-utf8.html>,
and the attached file.
This scheme is defined only for HTML 4.0, so you will need to mark your
document as a HTML 4.0 document, cf.
<http://www.w3.org/TR/REC-html40/struct/global.html#h-7.2>.
(The example page from Reuters does, however, not comprise this
mandantory declaration.)
You cannot legally include Latin-2 characters in pre-4.0 HTML, as HTML 3.2
mandates Latin-1. In HTML 4, you must specify any encoding other than
Latin-1; so your Latin-2, or UTF-8, encoded HTML pages must either be
sent with an appropriate HTTP header field, or they must contain a Meta
tag, as discussed above.
I also recommend to tag the various parts of your HTML page with their
respective languages,
cf. <http://www.w3.org/TR/REC-html40/struct/dirlang.html>, in particular
<http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1.1> combined
with <http://sunsite.auc.dk/RFC/rfc/rfc1766.html>,
<http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_639.html> and
<http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_3166.html>.
For more info on HTML i18n, cf. <http://www.w3.org/International/>.
You may also wish to read other parts of the HTML 4.0 specification,
and hints for HTML authors:
<http://www.w3.org/TR/REC-html40/>
<http://www.w3.org/MarkUp/#guidelines>
<http://www.w3.org/WAI/GL/#Current_Draft>
and to test your HTML source against pertinent validation services:
<http://validator.w3.org/>
<http://www.cast.org/bobby/>
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT