Re: Unicode Search Engines

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Feb 21 2002 - 02:04:52 EST


Michael Everson <everson@evertype.com> wrote:

>> No. The W3C CharMod wants receivers to check normalization and
>> reject unnormalized documents, *not* to normalize input.
>
> What does such rejection imply? That an HTML document using UTF-8
> declaring U+0041 U+0301 is acceptable but an HTML document using
> UTF-8 declaring U+00C1 is not?

Other way around. The normalization is to NFC, so the HTML document must
declare U+00C1, not U+0041 U+0301. It's OK to use U+0051 U+0301 because
there is no precomposed form.

Picking on the letter "Q",

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Thu Feb 21 2002 - 01:40:40 EST