From: Francois Yergeau (FYergeau@alis.com)
Date: Wed Oct 29 2003 - 09:59:46 CST
Philippe Verdy wrote:
> The idea that "if a text (without BOM) looks like valid
> UTF-8, then it is
> UTF-8; else it uses another legacy encoding" does not work in
> practice and also leads to too many false positives.
Can you point to actual data/cases? I don't mean theoretical, I can make up
my own.
> Some problems do
> exist however, with the relaxed rules for UTF-8 as it was
> defined in the IESG RFC.
Errr, relaxed? Care to elaborate? Are you referring to RFC 2279?
> These old texts (that are valid for this old
> version of the UTF-8 encoding) still exist now
What's particular about these old texts?
-- François
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST