Re: MS/Unix BOM FAQ again (small fix)

From: Mark Davis (mark@macchiato.com)
Date: Thu Apr 11 2002 - 10:21:12 EDT


It is a pretty good assumption; but if BOMs are used on smaller fields
the probability goes up. And to be perfectly reliable, you can't
assume it.

That is one reason that the WORD JOINER was encoded, so that
eventually we can use FEFF solely as a BOM.

Mark
—————

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Doug Ewell" <dewell@adelphia.net>
To: <unicode@unicode.org>
Cc: "Mark Davis" <mark@macchiato.com>; <jarkko.hietaniemi@nokia.com>;
<markus.scherer@jtcsv.com>
Sent: Wednesday, April 10, 2002 22:35
Subject: Re: MS/Unix BOM FAQ again (small fix)

> Mark Davis <mark@macchiato.com> wrote:
>
> > - when one of the BOM-allowing UTFs starts with a BOM, you know
the
> > encoding*, and you strip off the BOM when you get the content.
> >
> > *assuming that no UTF-16 file has U+0000 as the first character.
>
> In the real world, this is a pretty good assumption -- almost as
good,
> in fact, as the one I've been stating for years: that no Unicode
file
> will have a zero-width no-break space (intended as such) as the
first
> character.
>
> -Doug Ewell
> Fullerton, California
>
>
>
>



This archive was generated by hypermail 2.1.2 : Thu Apr 11 2002 - 08:56:38 EDT