Re: Names for UTF-8 with and without BOM

From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun Nov 03 2002 - 15:29:32 EST

Next message: Mark Davis: "Re: Header Reply-To"

Previous message: John Cowan: "Re: Names for UTF-8 with and without BOM"
In reply to: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Next in thread: Doug Ewell: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I don't know what you are trying to say. Perhaps you could explain it at the
meeting next week.

Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄

----- Original Message -----
From: "Michael (michka) Kaplan" <michka@trigeminal.com>
To: "Mark Davis" <mark.davis@jtcsv.com>; "Murray Sargent"
<murrays@exchange.microsoft.com>; "Joseph Boyle" <Boyle@siebel.com>
Cc: <unicode@unicode.org>
Sent: Saturday, November 02, 2002 04:18
Subject: Re: Names for UTF-8 with and without BOM

> From: "Mark Davis" <mark.davis@jtcsv.com>
>
> > That is not sufficient. The first three bytes could represent a real
> content
> > character, ZWNBSP or they could be a BOM. The label doesn't tell you.
>
> There are several problems with this supposition -- most notably the fact
> that there are cases that specifically claim this is not recommended and
> that U+2060 is prefered?
>
> > This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE
> 0xFF
> > represents a BOM, and is not part of the content. In the second case, it
> > does *not* represent a BOM -- it represents a ZWNBSP, and must not be
> > stripped. The difference here is that the encoding name tells you
exactly
> > what the situation is.
>
> I do not see this as a realistic scenario. I would argue that if the BOM
> matches the encoding scheme, perhaps this was an intentional effort to
make
> sure that applications which may not understand the higher level protocol
> can also see what the encoding scheme is.
>
> But even if we assume that someone has gone to the trouble of calling
> something UTF16BE and has 0xFE 0xFF at the beginning of the file. What
kind
> of content *is* such a code point that this is even worth calling out as a
> special case?
>
> If the goal is to clear and unambiguous text then the best way would to
> simplify ALL of this. It was previously decided to always call it a BOM,
why
> not stick with that?
>
> MichKa
>
>
>

Next message: Mark Davis: "Re: Header Reply-To"
Previous message: John Cowan: "Re: Names for UTF-8 with and without BOM"
In reply to: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Next in thread: Doug Ewell: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Nov 03 2002 - 15:59:53 EST