Re: browsers and unicode surrogates

From: Tex Texin (texin@progress.com)
Date: Mon Apr 22 2002 - 03:23:02 EDT


Steffen,
Section 5.2.1 discusses the BOM. Also see my previous mail.
I'll talk to my ISP to see if it's possible to have the charset in the
HTTP set. Thanks for noting it.

It wouldn't surprise me if the W3C validator didn't support utf-16, but
I'll ask the author.
tex

Steffen Kamp wrote:
>
> >I have added a couple more variations of the Unicode supplementary
> >characters example page, for utf-16 and utf-32.
>
> I am not sure if your UTF-16 and UTF-32 test pages really conform to the
> HTML standard. The server states a content type of "text/html" without
> charset information. From the content type a browser should therefore
> expect pure ASCII - at least until the META tag defining the documents
> character encoding.
>
> From the HTML 4.01 specification <http://www.w3.org/TR/html4/
> charset.html>, section 5.2.2:
>
> "The META declaration must only be used when the character encoding is
> organized such that ASCII-valued bytes stand for ASCII characters (at
> least until the META element is parsed)."
>
> Your documents, however, just start with a BOM and I couldn't find
> anything stating that a BOM would be a valid way of specifying the
> character encoding.
> Although some browsers seem to guess the character encoding from an
> available BOM I wouldn't expect them to do so when there usually are
> other ways of determining this information.
>
> To get a second opinion I asked w3.org's online validation service to
> check your UTF-16 document with auto detection of the character encoding.
> (<http://validator.w3.org/check?uri=http://www.i18nguy.com/unicode/
> plane1-utf-16.html&charset=(detect+automatically)&doctype=Inline>)
> The Validator complained about the BOM as well as (not surprisingly) a
> lot of ASCII zero (0x00) characters.
> However, when giving the validator a ASCII-only document with a META tag
> specifying UTF-16 as encoding (just for testing) it says that it does not
> yet support this encoding, so I don't fully trust the validator in this case.
>
> Steffen
>
> --
> Steffen Kamp
> mailto:steffen@ic.ac.uk
> http://homepage.mac.com/earthlingsoft

-- 
-------------------------------------------------------------
Tex Texin                    Director, International Business
mailto:Texin@Progress.com    the Progress Company
Tel: +1-781-280-4271         http://www.progress.com
-------------------------------------------------------------
"The world writes in my database!" Progress Exchange 2002
http://www.progress.com/exchange/labs.htm#globalization
Globalization Empowerment for Progress users
http://www.progress.com/consulting/globalization_empowerment_solutions.htm
A compelling demonstration for Unicode:
http://www.i18nguy.com/unicode/unicode-example-intro.html



This archive was generated by hypermail 2.1.2 : Mon Apr 22 2002 - 04:23:00 EDT