Re: Names for UTF-8 with and without BOM

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sun Nov 03 2002 - 09:22:54 EST

Next message: John H. Jenkins: "Re: ct, fj and blackletter ligatures"

Previous message: Peter_Constable@sil.org: "Re: ct, fj and blackletter ligatures"
In reply to: Peter_Constable@sil.org: "Re: Names for UTF-8 with and without BOM"
Next in thread: John Cowan: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> In particular, I'm thinking of a situation about a year and a half ago
> (IIRC) in which Michael (and I and others) were strongly opposed to a
> suggestion that the Unicode Consortium should document a certain variation
> (perversion, some would say) of one of the Unicode encoding forms that a
> certain vendor had implemented in their software. On that occasion,
> Michael (and I and others) were arguing that, just because they had done
> something in their software, that shouldn't mean that the rest of the
> world should be forced to support their encoding form.
>
> I find it interesting, then, to see Michael saying that, since Notepad
> sticks a BOM-cum-signature at the start of its UTF-8, the rest of the
> world should support it.

I do not see the conflict, or the irony? Remember that what Notepad and
others do is present mainly because it *is* in the XML standard, What was
being done by those others with UTF-8 was not a part of the UTF-8 "standard"
and was in fact specifically disallowed. In the end, note that UTF-8 was not
compromised; they got their own [non-preferred] encoding scheme for their
backcompat requirement, and they now have the "job" of making their products
use it in name.

If someone has a bug or problem in their software, then it is of course
their responsibility to fix it. On the other hand, if one pays attention to
a possible (optional) recommendation in a standard, it is the standard's
responsibility to not make people regret that they paid attention?

(Which is not to say that they got the "idea" from XML; I am not sure where
the idea came from. I figure that there was a strong interest in making sure
that when someone saved a file as UTF-8 that when reloaded it would still be
considered UTF-8, rather than ASCII or ANSI [sic]. This is a good reason for
such a decision in plain text --and the fact that XML is after all "just
text" is lost on no one...)

Given the strong lack of interest that XML has had in the notion of breaking
old parsers or valid XML 1.0 streams, it seems unlikely (to me) that they
would make such a breaking change in a future version of XML.

MichKa

Next message: John H. Jenkins: "Re: ct, fj and blackletter ligatures"
Previous message: Peter_Constable@sil.org: "Re: ct, fj and blackletter ligatures"
In reply to: Peter_Constable@sil.org: "Re: Names for UTF-8 with and without BOM"
Next in thread: John Cowan: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Nov 03 2002 - 10:00:26 EST