Do the CR & LF bytes in UTF-8 ONLY exist in this form?

From: alopecoid (alopecoid@gmail.com)
Date: Tue Aug 25 2009 - 13:33:18 CDT

  • Next message: John (Eljay) Love-Jensen: "Re: Do the CR & LF bytes in UTF-8 ONLY exist in this form?"

    Hi,

    I am having difficulty finding the answer to this question, so I
    figured this might be the best place to ask.

    I know that the ASCII characters are the same in UTF-8 as they are in
    ASCII. I also know that, in general, UTF-8 characters can be anywhere
    between 1 and 4 bytes. My question is: can the byte values for the
    ASCII characters appear by chance as the bytes in the 2nd to 4th
    positions of other UTF-8 characters?

    For example, let's say that I would like to read lines from a UTF-8
    encoded text file, but I don't need to actually decode each line... I
    just need to store the UTF-8 encoded lines somewhere. Is it safe to
    assume that if I encounter a CR (carriage return, '\r') byte or a LF
    (line feed, '\n') byte, that this byte belongs to it's own single byte
    character value? Or can the 8-bits that make up a CR or LF byte just
    happen to exist in another multi-byte character as bytes 2 through 4
    of that character?

    I hope my question is clear.

    Thank you.



    This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 13:39:25 CDT