Re: MS/Unix BOM FAQ again (small fix)

From: Mark Davis ([email protected])
Date: Tue Apr 09 2002 - 23:36:05 EDT

Previous message: Doug Ewell: "Re: MS/Unix BOM FAQ again (small fix)"
In reply to: Kenneth Whistler: "Re: MS/Unix BOM FAQ again (small fix)"
Next in thread: Doug Ewell: "Re: MS/Unix BOM FAQ again (small fix)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Sorry, I meant to write "since UTF-32LE, for example, could start with
bytes
FF FE."

It would be the start of a character like U+1FEFF, which would be

FF FE 01 00

Mark

—————

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Kenneth Whistler" <[email protected]>
To: <[email protected]>
Cc: <[email protected]>; <[email protected]>
Sent: Tuesday, April 09, 2002 19:23
Subject: Re: MS/Unix BOM FAQ again (small fix)

> > I agree, there are different ways to look at it. But the statement
> >
> > > > > A Unicode text file beginning with FEFF is
> > > > > big-endian, and a file beginning with FFFE (not a legal
Unicode
> > > > > character for any other purpose) is little-endian
> >
> > is just plain wrong, since UTF-32, for example, could start with
bytes
> > FE FF.
>
> Um, not legally in open interchange.
>
> Either you have big-endian UTF-32 <FE FF nn mm ..> which would
correspond
> to U-FEFFnnmm ... -- and that is out-of-range for both Unicode and
10646.
>
> Or you have little-endian UTF-32 <FE FF nn 00 ..> which would
correspond
> to U-00nnFFFE ..., where nn could be 00..10, but all such values are
> noncharacters, and cannot be used in open interchange.
>
> So if serialized "Unicode text" starts off <FE FF ...> and purports
to be legal,
> it cannot be UTF-32, it cannot be UTF-8, and it cannot be
little-endian.
>
> --Ken
>

Previous message: Doug Ewell: "Re: MS/Unix BOM FAQ again (small fix)"
In reply to: Kenneth Whistler: "Re: MS/Unix BOM FAQ again (small fix)"
Next in thread: Doug Ewell: "Re: MS/Unix BOM FAQ again (small fix)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Apr 10 2002 - 00:49:58 EDT