From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 05 2009 - 18:25:07 CST
> Asmus Freytag wrote:
>
> > The fact that the request to provide a solution using non-PUA
> > character codes is so strongly supported by leading search engine
> > manufacturer(s) should give you pause here.
John Hudson responded.
> It did give me pause, and that was the point that I realised that it is
> not in the interests of those companies to use any character codes for
> emoji, PUA or otherwise.
There is an ambiguity in John's response here. It is unclear whether
he means by "those companies" DoCoMo, KDDI, and SoftBank, i.e. the
Japanese wireless carriers who created these emoji sets in the first
place and used character codes to transmit emoji. Or whether he is
responding to Asmus' contention in particular, and means by "those
companies" Google, Yahoo!, and Microsoft, i.e. the leading search
engine companies who didn't *create* the emoji sets, but who have
to deal with the conversion of this wireless data to function with
their search engine technology.
The UTC has heard quite clearly from the search engine companies
(and others) that having a standard character encoding for dealing
with these existing characters in data is better than a PUA-based
encoding -- and that that clearly *is* in their interests. Any
such contention is quite different from the assessment as to
whether the Japanese wireless carriers should have treated any
of this stuff as SJIS extension gaiji characters in the first
place.
> Quite simply, Unicode character codes do not
> appear to be up to the task of encoding an open ended set of images that
> users might want to transmit to and from mobile devices,
I certainly wouldn't quarrel with that statement, and I am
pretty sure that would also be a consensus position among the
UTC.
> which is the
> problem that those companies should be trying to solve,
Again with the "those companies", however. The search engines
aren't trying to solve the problem of transmitting an open-ended
set of images to and from mobile devices. Nor do I think that
really is their purview. They will, of course, want to be
able to search and index such images as occur archived in
data on the internet, but that is a different problem.
> rather than
> hijacking a plain text encoding standard with an insufficient subset of
> such images. I understand that it was easy and convenient to use text
> character codes for this limited set of images to date, but it was not a
> good idea: I'm tempted to say it was a lazy, stop-gap measure, lacking
> in any sort of vision about the social use of technology that these
> companies are supposed to understand.
"These" companies? The lazy, stop-gap measure, if such it was,
was perpetrated by the Japanese wireless companies, which sought
an easy way to extend their character sets to make
various culturally appropriate symbols, pictographs, and emoticons
available quickly on phones. And they did it by a methodology
that has a long history in Japan: gaiji. As I've pointed out
before, this is just the latest example of this process in
Japan, cycling around from when the Japanese OS companies did
this kind of thing in extending JIS for Japanese computers back
in the 80's.
The "characters" here are now de facto data and need some
character encoding solution other than PUA, just as the need
to interoperate with the earlier instantiations of gaiji made
those extensions also necessary for standardized character encoding.
> It is not simply that I think
> these images do not belong in Unicode, but that adding them to Unicode
> does not solve the real problem.
And I think John has misidentified the real problem. Or rather, that
what he has identified as the "real" communication problem below
("the long future use of inline images in communications") is not
the real problem that the UTC is trying to address for the search
engine (and database) companies -- namely interoperating with the
existing, de facto, set of SJIS extensions used *as characters*
by the wireless operators in Japan.
> Since a proper solution, capable of
> addressing the long future use of inline images in communications, would
> by its nature also solve the present problem of non-standard handling of
> current emoji sets, why spend so much time and energy forcing Unicode to
> accept something that will be obsolete almost as soon as it completes
> the ballot process?
The "proper solution" envisioned here would *obsolete* the need to
resort to character-based, non-extensible hacks for transmitting
pictographic symbols in the way the wireless carriers in Japan now
are doing -- but it would not solve the *present problem* of
dealing with the de facto existing characters *as* characters,
which is what we are up against here.
By the way, one of the reasons I have spoken out strongly against
having the 10 flag-icon-based-locale-symbols in the emoji sets
being turned into an excuse for an open-ended scheme for encoding
flags as characters is because I *agree* with John's general
contention about the inappropriateness of using characters to
represent entities that are essentially images, rather than
text symbols.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jan 05 2009 - 18:28:20 CST