From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 05 2009 - 19:34:14 CST
André Szabolcs Szelp responded:
> 2009/1/6 Kenneth Whistler <kenw@sybase.com>:
> > The "proper solution" envisioned here would *obsolete* the need to
> > resort to character-based, non-extensible hacks for transmitting
> > pictographic symbols in the way the wireless carriers in Japan now
> > are doing -- but it would not solve the *present problem* of
> > dealing with the de facto existing characters *as* characters,
> > which is what we are up against here.
>
> So are Ewellic, Verdurian, Røzhxh etc., etc. characters existing *as*
> characters.
But this contention ignores the essential difference between
Ewellic (and all the other denizens) of the PUA codes in the
ConScript registry, and the wireless emoji sets. Mark
Crispin pointed it out:
crispin>> There are hundreds of millions of mobile phones with
crispin>> the current [emoji] set.
Implementation on hundreds of millions of devices connected
to the internet and to the search engines and databases
operating on those data streams makes this a quintessential
case of an encoding requiring a public, *standard* solution.
Ewellic, on the other hand -- as the author of the script
himself has attested on this list -- is not widely used, nor
does it require more than a private agreement for PUA code
points for the few who might actually wish to exchange data.
> And any future ad-hoc code-point assignment to any
> possible ad-hoc entities, be them signs, letters, characters, sound
> files, whatever.
Ad hoc assignment of numbers to entities by somebody does not
render such entities, ipso facto, candidates as characters
in the UCS.
Even in the cases (such as PUA encodings of ConScripts) where
there clearly is both an existential and functional case to
be made for those encodings as *character* encodings, there
will always be an extended fringe of private use that will
not rise to the level of appropriateness for encoding in
Unicode, IMO.
> I have understood the argument of the UTC why they want these emoji in
> Unicode ("they are currently handled by operators as characters in a
> particular encoding scheme"), but I have not heard answers how they
> wish to proceed in future if someone wants to have arbitrary
> characters accepted based on the same argument/precedent. ("something
> handled in some context as characters in a particular encoding
> scheme")
>
> I would be grateful if UTC could sketch an anticipated procedure.
It is very straightforward, and has been detailed for years at:
http://www.unicode.org/pending/proposals.html
Anyone who thinks they have a case for something to be encoded
as characters in Unicode can write up a proposal, submit it,
and then be prepared to defend it through the several years it
takes to reach consensus in both committees (UTC and WG2) and
to shepherd it through the multiple layers of ballotting
involved.
People who expect a deductive decision procedure which could
be applied ahead of time to a candidate "entity" for encoding,
to determine absolutely whether it should be encoded as
a character or not, are, again IMO, likely to be sorely
disappointed. Character encoding is not a rational science --
it is one part politics, one part technology, and one part
history, with a few dashes of randomness and whimsey tossed
in for seasoning.
And as Asmus just pointed out, that's why we have committees
debating all this, instead of rule books and registries.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jan 05 2009 - 19:35:36 CST