RE: UTF-8 ill-formed question from Doug Ewell on 2012-12-11 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Tue, 11 Dec 2012 14:15:43 -0700

Ian Clifton <ian dot clifton at chem dot ox dot ac dot uk> wrote:

>> Does anyone know why ill-form occurred on the UTF-8? besides it
>> doesn't follow > the pattern of UTF-8 byte-sequences, i just
>> wondering how or why?
>
> There’s a lot about the conditions for the well-formedness of UTF-8
> sequences in Chapter 3 of the Standard:
>
> [...]
>
> Even if these conditions hold, however, a UTF-8 sequence might still
> be ill-formed, Table 3-7 exhaustively lists all the cases.

But the bottom line is, there's nothing ill-formed about James' original
example. It's perfectly good UTF-8. The visual similarity between the
digits in U+4E8C and the first and last bytes in <E4 BA 8C> is mostly
coincidental.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Received on Tue Dec 11 2012 - 15:18:43 CST

This archive was generated by hypermail 2.2.0 : Tue Dec 11 2012 - 15:18:44 CST