Do the CR & LF bytes in UTF-8 ONLY exist in this form?

From: alopecoid (alopecoid@gmail.com)
Date: Tue Aug 25 2009 - 13:33:18 CDT

Next message: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"

Previous message: Roozbeh Pournader: "Re: [indic] Re: Use of ZWJ to form Sinhala Conjuncts"
Next in thread: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Reply: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Reply: Murray Sargent: "RE: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Reply: Mark Crispin: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi,

I am having difficulty finding the answer to this question, so I
figured this might be the best place to ask.

I know that the ASCII characters are the same in UTF-8 as they are in
ASCII. I also know that, in general, UTF-8 characters can be anywhere
between 1 and 4 bytes. My question is: can the byte values for the
ASCII characters appear by chance as the bytes in the 2nd to 4th
positions of other UTF-8 characters?

For example, let's say that I would like to read lines from a UTF-8
encoded text file, but I don't need to actually decode each line... I
just need to store the UTF-8 encoded lines somewhere. Is it safe to
assume that if I encounter a CR (carriage return, '\r') byte or a LF
(line feed, '\n') byte, that this byte belongs to it's own single byte
character value? Or can the 8-bits that make up a CR or LF byte just
happen to exist in another multi-byte character as bytes 2 through 4
of that character?

I hope my question is clear.

Thank you.

Next message: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Previous message: Roozbeh Pournader: "Re: [indic] Re: Use of ZWJ to form Sinhala Conjuncts"
Next in thread: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Reply: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Reply: Murray Sargent: "RE: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Reply: Mark Crispin: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 13:39:25 CDT