From: Doug Ewell (doug@ewellic.org)
Date: Sun Jan 04 2009 - 22:53:12 CST
Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
> It's an attempt to separate the two facets of compatibility: One is
> based on interoperability needs being the primary base for encoding
> the character. The other is based on a character having a
> compatibility decomposition. The latter are the ones that could be
> called "compatibility variants", because they can be considered a
> variant of an existing (ordinary) character.
>
> (In discussions like this, I personally prefer the term "ordinary" in
> place of the more cumbersome circumlocution "normal (that is,
> non-compatibility)".)
I don't see any definition of "compatibility character" in the TUS book
that refers to this first facet, that is, a character that is
*completely unrelated* to any other character in the standard but is
encoded due to "interoperability needs." The entry for "compatibility
character" in the Glossary is simply a truncation of the longer
definition in Section 2.3, and in fact the Glossary entry directs the
reader to the full description in Section 2.3.
The whole purpose for calling emoji "compatibility characters" seems to
be to exempt them from the normal stated guidelines of what is and is
not a candidate for encoding.
If you (Ken, Mark, anyone) can show me a definition of "compatibility
character" that refers to this "interoperability needs" aspect and does
not assume any relationship between these characters and "ordinary"
characters, I would appreciate it. This has to be a definition long
enough to stand on its own, not just a selective truncation of a longer
text (so excluding the Glossary entry), and it has to be found somewhere
within the Unicode 5.0 or 5.1 text, including standard annexes.
If no such definition can be found, then I have to assume this
"interoperability needs" argument was created solely for the purpose of
admitting the emoji set.
> It should be immediately obvious, that not all characters needed for
> interoperability (compatibility) can be guaranteed to have an ordinary
> character counterpart. Therefore, some characters that look like
> ordinary characters (because they don't have a compatibility
> decomposition) are in fact encoded for compatibility.
It is not immediately obvious to me. Can you give some examples of
currently encoded compatibility characters that have no ordinary
character counterpart, but were encoded solely for compatibility with
external, post-1993 standards?
> The set of emoji (and also emoticons) are composed of many ordinary
> characters (straightforward symbols), plus compatibility characters
> that do not have a decomposition.
Of the images in the "Table for Working Draft Proposal" that do not
already have a Unicode code point, I don't think I see more than 20 or
so that are composed of existing, ordinary characters. Practically all
are new images.
>> At least now when I see a black-and-white statement such as "Unicode
>> does not encode idiosyncratic, personal, novel, or private-use
>> characters, nor does it encode logos or graphics," I know how to
>> interpret it.
>
> Yes, "graphics" is not a very well-defined term ;-)
As discussed over the past two weeks, there are some things like the
letter "A" which clearly fall on one side of the text/graphics
continuum, and other things like the Venus de Milo which clearly fall on
the other side. There is a substantial gray area in between. I think
we can agree on that. Now, guess which side I think CLINKING BEER MUGS
belongs on.
> And "novel" would have encompassed the Euro sign before 2002, yet it
> was coded well in advance of the actual introduction of that currency.
EURO SIGN is not an ideal example. It was well known and undisputed in
1998 that this symbol would become ubiquitous and globally important
within a few years. The restriction against novel characters was
clearly and explicitly intended to exclude characters whose importance
and/or staying power was unknown. (Principles and Procedures, section
H.10: "The euro sign... is a novel symbol for which there is
demonstrated and strong demand.")
And even if EURO SIGN did break the rule against "novel" symbols, there
was only one of them, not 618.
>> I've been a huge and vocal supporter of the Unicode Standard for the
>> past 16 years, back before most people had heard of it, and this is
>> by far the most disappointed I have ever been in the Standard. This
>> decision will come back to haunt Unicode again and again.
>
> First, there hasn't been a decision. Certainly not a final one. So
> it's a bit premature to express things this way.
I and others have already been told, publicly and privately, to stop
arguing against inclusion of the entire emoji set, because "'resistance'
is not helpful" and "the decision to encode the emoji as individual code
points does not need to be revisited." Doesn't sound to me like a
particularly bumpy road to UTC approval. The real tough questions might
have to come from member bodies in WG2.
> Second, if you've been around that long, you might have heard about
> similar discussions where people were predicting bad outcomes from
> certain decisions. Surprisingly enough, things didn't always turn out
> as badly as predicted. Some issues, after being hotly contested and
> taking truly enormous bandwidth in the committee, and on the lists,
> have sunk out of sight without a trace, the minute they were decided
> (and seem to have had no observable impact on the standard).
> Astonishing, but true.
One of the better examples, I concede, was the encoding of the math
alphabets, which did not (to some people's surprise) result in
widespread use of these symbols for bold, italic, etc. markup in plain
text. (My MathText application, which performed this kind of abuse, was
an April Fool's parody.) Neither the approval nor the subsequent
deprecation of the Plane 14 tags caused any lasting harm, though I still
contend the deprecation solved nothing. The encoding of Phoenician
separately from Square Hebrew does not appear to have ruined
text-searching capability for Middle Eastern scholars. And even the old
flames about CJK unification seem to have died down.
But all of these issues, except arguably Plane 14, had something to do
with characters in a writing system. Even mathematical formulas,
two-dimensional though they may be, are still composed from what most
people would call "writing." There are a great many images in the emoji
set that have nothing whatsoever to do with writing, nor layout control,
nor text meta-information, nor symbols with semantic value. They are
cute little wiggling pictures of balloons and party poppers.
None of the other issues required such revisionism of basic principles
of Unicode and 10646. There is nothing in the 1400-page TUS 5.0 book
that stretches the meaning of "compatibility characters" to encompass
wiggling pictures of balloons and party poppers. That's a retrofit.
> Third, I really hope that no single issue can affect your support for
> the standard, if it's sustained you for 16 years so far.
I can no longer tell myself or anyone else that such-and-so character or
symbol is something that Unicode would or would not consider encoding.
Everything is up in the air now.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sun Jan 04 2009 - 22:55:36 CST