RE: UTF-8 signature in web and email

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue May 22 2001 - 08:16:40 EDT


John Cowan wrote:
> > [...] U+FEFF: [...]
> > it (also) is a "ZERO WIDTH NO-BREAK SPACE".
>
> Actually, this semantic seems to be going away soon, but
> until it does...

My only information about the UTC's decisions is what passes on this mailing
list, so I trust you.

But I know that character names are not going to change, and I think that
the same is true for normative character properties (correct?), so how can
the current semantic be so radically changed?

> ...it is not quite true that ZWNBSP has no semantics. There
> is a fundamental
> difference between "inactive" and "in!active" (where "!"
> represents ZWNBSP),
> namely that at the end of a line such as this one, it is
> correct to show "in-
> active" with hyphenation, whereas "in!active" at the end of a
> line must be
> "inactive", with wordwrap.

OK. Thank you for reminding this, although I didn't say that it has "no"
semantics.

However, a ZWNBSP at the beginning or end of a file (or even at the
beginning or end of a paragraph) has no such implications with hyphenation
and -- more important -- it has no practical effect.

So what I meant still holds true, I think: applications that implement
heuristics for detecting UTF-8 will benefit of a ZWNBSP at the beginning of
a file (the detection will be much faster and safer), but they do not
*require* it.

On the other hand, applications that don't implement such a thing can quite
safely treat any ZWNBSP as a normal character.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT