Re: UTF-8N?

From: Peter_Constable@sil.org
Date: Wed Jun 21 2000 - 15:26:37 EDT


Eh??? John, either I'm really missing your intent, or you're saying
something that I know you don't mean. U+0020 in UTF-8 is always 0x20,
whether or not the file begins with a BOM. While I haven't met you in
person, I've learned enough about you by email that I'm pretty sure you
know this already. So, I must be missing what you're meaning to
communicate.

Peter Constable

On 06/21/2000 02:56:31 PM <jcowan@reutershealth.com> wrote:

>Peter_Constable@sil.org wrote:
>
>> UTF-8 files both with and without a BOM serialize the character
>> representations into bytes (octets) in exactly the same way. That's the
>
>basis for distinguishing between encoding schemes, and since there isn't a
>
>difference, there is only one encoding scheme involved in both cases.
>
>I don't think so. One encoding scheme encodes U+0020 (a single space
character)
>as
>one byte (0x20), whereas the other one encodes it as four bytes
>(0xEF 0xBB 0xBF 0x20).
>
>--
>
>Schlingt dreifach einen Kreis um dies! || John Cowan
<jcowan@reutershealth.com>
>Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
>Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan
>Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT