From: James Lin <James_Lin_at_symantec.com>
> Hi
> Does anyone know why ill-form occurred on the UTF-8? besides it
> doesn't follow > the pattern of UTF-8 byte-sequences, i just
> wondering how or why?
There’s a lot about the conditions for the well‐formedness of UTF-8
sequences in Chapter 3 of the Standard:
http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf
Basically, a header byte starting with 𝑛 1-bits (2 ≤ 𝑛 ≤ 4) and a 0-bit
must be followed by 𝑛−1 trailer bytes starting 10…, and that’s the only
place such trailer bytes should occur. Even if these conditions hold,
however, a UTF-8 sequence might still be ill‐formed, Table 3-7
exhaustively lists all the cases.
-- Ian ◎Received on Tue Dec 11 2012 - 15:01:13 CST
This archive was generated by hypermail 2.2.0 : Tue Dec 11 2012 - 15:01:13 CST