Re: Emoji: emoticons vs. literacy

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 07 2009 - 15:17:59 CST

  • Next message: ktadenev@ups.com: "RE: UTF-8 and string manipulations in Java"

    James Kass wrote, in response to Mark Davis:

    > And then there's those last two lines.

    [Referring to Mark's statement: "And having the emoji symbols encoded
     will be far more useful to many more peple than, say, the Phaistos disk
     symbols."]

    > There don't appear to
    > be pragmatic interoperability issues driving Unicode's push
    > to encode these emoji.

    This just seems to me to be a willful refusal to believe
    what the folks dealing with these interoperability issues
    have asserted, more than once.

    From the point of view of the function of the standard, the
    interoperability issue is no different than that which was
    posed by character extensions to HKSCS (Hong Kong) which
    were unencoded in Unicode and which had PUA assignments for
    conversions. Those posed interoperability problems for
    Unicode systems. And the solution was quick encoding of
    characters for the ones that were unencoded. That process posed
    no serious political problems -- it simply worked its way
    through proposals, and the characters were added to 10646
    amendments, and ended up in the standard with unanimous
    approvals.

    The very *same* folks who asserted that having unencoded
    HKSCS characters posed interoperability problems are now
    asserting that having unencoded Japanese wireless carrier
    emoji characters poses interoperability problems for
    Unicode systems.

    I can only conclude that the *real* issue here for the
    opposition to the process is that they don't like cutesy
    Japanese emoji, not that they truly understand or care
    about the technical interoperability problem.

    > Unicode plain-text fulfills its goal of interoperability,
    > in the unlikely or unwelcome event that private messages between
    > cell phone users in Japan are getting sucked into somewhere they
    > shouldn't be, by guaranteeing that those private use Unicode
    > characters don't get munged somewhere along the way.

    The rhetoric here is completely off-base. These 100 million
    cell phones are connected to the internet. The users send
    email using their cell phones. That email can go anywhere,
    and it gets converted to Unicode at boundaries to systems
    and networks -- it isn't safely confined inside private
    networks in Japan. We've even had it demonstrated on this
    list, so nobody should still be in denial about it
    occurring. This isn't "somewhere they shouldn't be" -- the
    internet is where those phones are *designed* to connect
    to. And to assert otherwise is simply to stick your head
    in the sand.

    And Marcus has already pointed out that the established PUA
    mappings are not discrete, nor do they (or can they) avoid
    other PUA usages, so the characters *do* get munged or
    misinterpreted. This is just another example of the principle:
    PUA in public interchange = mojibake. And mojibake is something
    that Unicode was designed to eliminate.

    > The information about cell phone vendors' PUA use is available to
    > search engine companies, they can use it as they please.

    This is true, but utterly specious in this context. PUA
    characters in plain text don't carry around labels defining
    their conventions. Their identification isn't reliable even
    in protocols such as email and HTML which have mechanisms
    for identifying character sets, as we all should realize by
    now.

    I'll repeat the equation: PUA in public interchange = mojibake

    The solution of letting the search engine companies eat
    PUA cake is basically a recipe for continuing mojibake.
    And neither the UTC participants nor the Japanese telco
    companies are going to put up with that as a solution.

    > Who is it, then, that benefits? Is it the potential future customers
    > and existing customer base of other cell phone vendors world-wide?
    > No, they'll surely end up just adding their stuff to the PUA, too.
    > That way, *they* control it. As it should be.

    There is so much wrong with that that it is hard to know
    where to start.

    The short answer is that *everyone* benefits from having
    a standard that promotes interoperability of text interchange
    globally without data corruption.

    And adding characters to cell phone vendors private encodings
    as "their" PUA, so they can "control" it is *not* as it
    should be. All that accomplishes is generating more
    interoperability hell in a global IT infrastructure.

    Why do you think all the OS companies long ago gave up "their"
    right to define their own character sets, which *they*
    could control? Was that as it should have been for IBM,
    for Microsoft, or for the many others who once defined
    proprietary character sets when they had no practical
    alternatives?

    > What are these benefits, who is going to get them, and how much
    > serious attention is given to alternatives?

    I think this whole argument has been so clouded by emoji-hating
    and by FUD about color and animation and other concerns
    focussed on *glyphs* rather than text interchange, that
    it is unlikely that a reasoned assessment of benefits
    will seem convincing to those who don't want to hear it.

    I'll just assert that as far as I am concerned the
    benefits are self-evident.

    I recognize that you and others strongly disagree. So be it.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 15:21:34 CST