From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Feb 26 2003 - 19:56:43 EST
Yung-Fong Tang wrote:
> I see a hole here. How about UTF-8 representing a paired of surrogate
> code point with two 3 octets sequence instead of an one octets UTF-8
> sequence? It should be ill-formed since it is non-shortest form also,
> right? But we really need to watch out the language used there so we
> won't create new problem. I DO NOT want people think one 3 otects of
> UTF-8 surrogate low or high is ill-formed but one 3 octets of UTF-8
> surrogate high followed by a one 3 octets of UTF-8 surrogate low is legal.
How would you infer that a pair of any ill-formed sequences is not also ill-formed, without any
specific text allowing such?
Remember also that such pairs of 3-byte surrogate sequences were forbidden at the same time CESU-8
was created.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Wed Feb 26 2003 - 20:31:35 EST