From: John (Eljay) Love-Jensen (eljay@adobe.com)
Date: Tue Aug 25 2009 - 13:48:41 CDT
Hi alopecoid,
> can the byte values for the ASCII characters appear by chance as the bytes in
the 2nd to 4th positions of other UTF-8 characters?
No. Only 0x80 - 0xBF appear in the 2nd to 4th positions.
> Is it safe to assume that if I encounter a CR (carriage return, '\r') byte or
a LF (line feed, '\n') byte, that this byte belongs to it's own single byte
character value?
Yes.
> Or can the 8-bits that make up a CR or LF byte just happen to exist in another
multi-byte character as bytes 2 through 4 of that character?
No.
All "trailing" UTF-8 encoding units have the bit pattern 10xxxxxx, so they
will always be between 0x80 - 0xBF, safely avoiding '\n' (0x0A) and '\r'
(0x0D).
> I hope my question is clear.
Yes.
> Thank you.
You're welcome.
Sincerely,
--Eljay
This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 13:52:16 CDT