Re: Problem with SSI and BOM

From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Sep 25 2006 - 08:12:37 CST

Next message: Cristian Secară: "what is the Unicode correspondent of character HORIZONTAL BAR from ISO/IEC 6397 ?"

Previous message: Jukka K. Korpela: "Re: Question about formatting numerals"
In reply to: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Next in thread: Philippe Verdy: "Re: Problem with SSI and BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 9/24/06, Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:
>
> On Sun, 24 Sep 2006, Doug Ewell wrote:
>
> > A process that claims to be able to "support Unicode"
> > should at least be able to follow the simple rule, "If the file or
> stream
> > starts with EF BB BF, throw them away and treat the remainder of the
> file or
> > stream as UTF-8."
>
> No, that would be incorrect if the character encoding of the data has been
> declared. It would be a mistake to start interpreting the octets of data
> in a manner othen than the declared encoding, at least as long as the data
> is formally correct according to the encoding.

In theory, that's correct. In practice, however, the charset is set
incorrectly so, so often. In a browser, the user can reset the charset
manually if he or she sees that it is wrong. That option is not available to
more mechanical processes like search engines -- there, the process simply
can't afford to always believe the charset parameter(s), any more than it
can always depend on the HTML being valid.

Mark

Next message: Cristian Secară: "what is the Unicode correspondent of character HORIZONTAL BAR from ISO/IEC 6397 ?"
Previous message: Jukka K. Korpela: "Re: Question about formatting numerals"
In reply to: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Next in thread: Philippe Verdy: "Re: Problem with SSI and BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Sep 25 2006 - 08:19:34 CST