From: SADAHIRO Tomoyuki (bqw10602@nifty.com)
Date: Thu May 11 2006 - 09:44:41 CDT
Mark Davis wrote
> The result of forbidding certain characters would mean that it would be
> only supporting a subset of Unicode characters. This is already done by
> some protocols; for example, IDN explicitly forbids certain characters
> on input. That should be clarified in the text. Such an implementation
> cannot claim to be a conformant implementation of normalization for all
> Unicode characters.
>
> You're right to bring this up; it would need to be clarified in the text.
>
> Mark
I see. Thank you.
In my opinion, anyone who will design a subset normalization should
take care that it can be easily processed taking advantage of a "full"
(i.e. UAX#15-conforming) normalizer (as long as the subset normalization
is intended to be relevant to the normalization of UAX#15).
For that purpose, a subset normalization should coincide with the full
normalization within the subset and reject any input outside the subset
to make sure that no output is inconsistent with the full normalization.
P.S. The normalization for legacy encodings (Annex 6 in UAX#15) may be
a sort of subset normalization, as the repertoire of a legacy encoding
is mapped onto a subset of Unicode.
Regards,
SADAHIRO Tomoyuki
This archive was generated by hypermail 2.1.5 : Thu May 11 2006 - 09:45:26 CDT