From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 07 2009 - 15:17:59 CST
James Kass wrote, in response to Mark Davis:
> And then there's those last two lines.
[Referring to Mark's statement: "And having the emoji symbols encoded
will be far more useful to many more peple than, say, the Phaistos disk
symbols."]
> There don't appear to
> be pragmatic interoperability issues driving Unicode's push
> to encode these emoji.
This just seems to me to be a willful refusal to believe
what the folks dealing with these interoperability issues
have asserted, more than once.
From the point of view of the function of the standard, the
interoperability issue is no different than that which was
posed by character extensions to HKSCS (Hong Kong) which
were unencoded in Unicode and which had PUA assignments for
conversions. Those posed interoperability problems for
Unicode systems. And the solution was quick encoding of
characters for the ones that were unencoded. That process posed
no serious political problems -- it simply worked its way
through proposals, and the characters were added to 10646
amendments, and ended up in the standard with unanimous
approvals.
The very *same* folks who asserted that having unencoded
HKSCS characters posed interoperability problems are now
asserting that having unencoded Japanese wireless carrier
emoji characters poses interoperability problems for
Unicode systems.
I can only conclude that the *real* issue here for the
opposition to the process is that they don't like cutesy
Japanese emoji, not that they truly understand or care
about the technical interoperability problem.
> Unicode plain-text fulfills its goal of interoperability,
> in the unlikely or unwelcome event that private messages between
> cell phone users in Japan are getting sucked into somewhere they
> shouldn't be, by guaranteeing that those private use Unicode
> characters don't get munged somewhere along the way.
The rhetoric here is completely off-base. These 100 million
cell phones are connected to the internet. The users send
email using their cell phones. That email can go anywhere,
and it gets converted to Unicode at boundaries to systems
and networks -- it isn't safely confined inside private
networks in Japan. We've even had it demonstrated on this
list, so nobody should still be in denial about it
occurring. This isn't "somewhere they shouldn't be" -- the
internet is where those phones are *designed* to connect
to. And to assert otherwise is simply to stick your head
in the sand.
And Marcus has already pointed out that the established PUA
mappings are not discrete, nor do they (or can they) avoid
other PUA usages, so the characters *do* get munged or
misinterpreted. This is just another example of the principle:
PUA in public interchange = mojibake. And mojibake is something
that Unicode was designed to eliminate.
> The information about cell phone vendors' PUA use is available to
> search engine companies, they can use it as they please.
This is true, but utterly specious in this context. PUA
characters in plain text don't carry around labels defining
their conventions. Their identification isn't reliable even
in protocols such as email and HTML which have mechanisms
for identifying character sets, as we all should realize by
now.
I'll repeat the equation: PUA in public interchange = mojibake
The solution of letting the search engine companies eat
PUA cake is basically a recipe for continuing mojibake.
And neither the UTC participants nor the Japanese telco
companies are going to put up with that as a solution.
> Who is it, then, that benefits? Is it the potential future customers
> and existing customer base of other cell phone vendors world-wide?
> No, they'll surely end up just adding their stuff to the PUA, too.
> That way, *they* control it. As it should be.
There is so much wrong with that that it is hard to know
where to start.
The short answer is that *everyone* benefits from having
a standard that promotes interoperability of text interchange
globally without data corruption.
And adding characters to cell phone vendors private encodings
as "their" PUA, so they can "control" it is *not* as it
should be. All that accomplishes is generating more
interoperability hell in a global IT infrastructure.
Why do you think all the OS companies long ago gave up "their"
right to define their own character sets, which *they*
could control? Was that as it should have been for IBM,
for Microsoft, or for the many others who once defined
proprietary character sets when they had no practical
alternatives?
> What are these benefits, who is going to get them, and how much
> serious attention is given to alternatives?
I think this whole argument has been so clouded by emoji-hating
and by FUD about color and animation and other concerns
focussed on *glyphs* rather than text interchange, that
it is unlikely that a reasoned assessment of benefits
will seem convincing to those who don't want to hear it.
I'll just assert that as far as I am concerned the
benefits are self-evident.
I recognize that you and others strongly disagree. So be it.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 15:21:34 CST