David Hopwood and Carl Brown graciously corrected me:
>> I don't agree that irregular UTF-8 sequences in general can only decode to
>> characters above 0xFFFF.
>
> That's why I specifically referred to irregular sequences as defined by
> Unicode 3.1 (i.e. UAX #27).
I stand corrected. That's what I get for not having a copy of UAX #27 handy.
Non-shortest sequences, of course, used to be considered irregular (not
invalid) in Unicode 3.0, before the Technical Committee wisely tightened up
the definition of UTF-8.
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Wed Sep 19 2001 - 00:12:24 EDT