From: YTang0648@aol.com
Date: Wed Nov 05 2003 - 15:36:05 EST
Ok, let's forget about the HTML discussion and let's talk about XML:
In a message dated 11/5/2003 12:11:21 PM Pacific Standard Time,
verdy_p@wanadoo.fr writes:
One can however use it safely with XHTML, because XHTML documents are XML
documents which may specify explicitly another document schema that includes
this extra attribute (thanks to the modular model of XHTML). But you'll have
to provide your own XML schema...
hum... not quite the same. Be carefully here. It depend on what MIME type you
used in the Content-Type for your xhtml....
you need to carefully read the following two documents
1. RFC 3023- XML Media types http://www.faqs.org/rfcs/rfc3023.html
2. XHTML Media Type http://www.w3.org/TR/xhtml-media-types/
Note that for XHTML, which must be a valid XML document, UTF-8 is the
default if nothing is specified.
Not true, according to XHTML Media Type
http://www.w3.org/TR/xhtml-media-types/ if you are using "application/xhtml+xml" or "application/xml" for your
xhtml, then "UTF-8 is the default if nothign is specified". However, if you use
"text/xml" as your Content-Type in the header. Read the following text from RFC
3023- XML Media types http://www.faqs.org/rfcs/rfc3023.html :
[begin of quote]
3.6 Summary
The following list applies to text/xml, text/xml-external-parsed-
entity, and XML-based media types under the top-level type "text"
that define the charset parameter according to this specification:
o Charset parameter is strongly recommended.
o If the charset parameter is not specified, the default is "us-
ascii". The default of "iso-8859-1" in HTTP is explicitly
overridden.
o No error handling provisions.
o An encoding declaration, if present, is irrelevant, but when
saving a received resource as a file, the correct encoding
declaration SHOULD be inserted.
[end of quote]
Notice, it say not only the "us-ascii" is the default if there are no charset
parameter in the HTTP Content-Type header. It ALSO said that "any encoding
declaration" (that mean <?xml encoding=""?>) ", if present, is irrevleant".
(Supprise :) )
But the XML declaration may be added on top
to specify the charset to use when parsing the XML document. In that case,
the XML declaration in the document takes precedence on the external HTTP
header, which itself takes precedence on the <meta http-equiv /> elements.
That is not what the RFC 3023 say. Actaully, in RFC3023, it say such XML
declaration should have no effect if received over HTTP protocol.
So if you want full XML compliance and support for legacy browsers, you need
to:
First thing need to be done. Add charset=UTF-8 to the HTTP Content-Type
header itself if you are using "text/xml'. or the other approach is to use non
"text" MIME Content-Type.
- use a leading <?xml ?> declaration with the explicit charset
pseudo-attribute.
Not a bad idea to do it anyway.
- declare the <!DOCTYPE > with your own schema, and make this extended
schema accessible at the referenced SYSTEM url, and give it a specific
PUBLIC doctype name.
- use a <meta http-equiv /> tag very soon in your <head> section, even
before any possibly internationalized string like the <title></title>
element (in fact it is recommanded to put ALL <meta http-equiv /> elements
before the required <title></title> element and then only put the other
<meta name /> elements such as robots control tags, description and
keywords)
- avoid all line breaks within <meta http-equiv /> elements (needed for
some web servers tuned for performance and that can parse lazily the HTML
document before generating HTTP headers), unless you can control the
generation of HTTP headers (with a external server control file like
.httpd.conf or similar features, or if you generate headers yourself within
a server-side script)
no clue why you need this.
- make sure you insert a space before all abbreviated elements
terminators "/>"
- always specify explicitly the "iso-8859-1" document charset with the
above method, if this is the one you use, as the default charset differs
between HTML (which defaults to ISO-8859-1) and XHTML (which defaults to
UTF-8, per XML conformance, unless there's a leading BOM to specify UTF-16
or UTF-32)
==================================
Frank Yung-Fong Tang
System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
AIM:yungfongta mailto:ytang0648@aol.com Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
John 3:16 "For God so loved the world that he gave his one and only Son, that
whoever believes in him shall not perish but have eternal life.
Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
Iñtërnâtiônàlizætiøn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/
This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 16:22:12 EST