From: James Kass (thunder-bird@earthlink.net)
Date: Sat Jan 03 2009 - 00:00:39 CST
Peter Constable replied,
>>> These are getting interchanged publicly between
>>> different vendors products. That's not private use.
>
>> Semantics. There is no point to user defined characters if
>> they can't be exchanged. There is even at least one well-known
>> PUA registry.
>
> I don't mean just communicated between different vendors'
> processes, but also interpreted and processed by different
> vendors' processes, in contexts where no private agreement
> can be assumed.
The existence of a private agreement is a given, otherwise
neither interpretation nor processing would be desired. In
contexts where the nature of the private agreement cannot
be determined, no interpretation is possible. Processing can
be done on uninterpreted strings. I don't need to be able to
speak Hindi in order to enter, store, search, and collate text
written in Devanagari, and neither does my plain-text editor.
Success in interpreting the text, then, lies in determining the
nature of the private agreement. This is not a new concept,
it has been discussed here previously, unless I'm mistaken.
Mark-up was one method mentioned, if I recall correctly.
Search engines can interpret mark-up.
> If text content is getting generated in (say)
> DoCoMo text protocols, spreading into other content via
> other protocols and then that content is getting interpreted
> by processes produced by Google or Apple or whomever,
> than the sense in UTC (I think I can say) is going to be that
> that is *public* interchange, hence presenting a case for
> being representable in the UCS.
Public interchange of private characters, which happens all
the time, is a good indicator that a case might be made for
plain-text encoding. Suitability again, opinions may vary,
members vote. (I'm trying to rephrase and expand on what
you said to see if we're basically agreeing here.)
>> Quite so. Refusing to encode these would be the best
>> tactic to keep others from using the PUA to "promote"
>> their thingies into regular Unicode.
>
> By that line of argumentation, we could completely
> freeze encoding of any new characters as a tactic to
> keep others from inventing new characters that might
> need to be encoded -- sure, we could do that; but that
> doesn't mean we *should* on that basis.
We shouldn't exclude text-like characters from being included
in a plain-text encoding standard as long as all the criteria are
met. "Thingies" might have been a poor choice of words on
my part. To rephrase, refusing to encode this set of proprietary
random icons en-masse would prevent others from trying to
get their icon sets (or whatever) promoted.
Not to say that some of the underlying symbols which some
of the icons represent shouldn't be encoded, many of them
already are. Michael Everson pointed out some which weren't
and probably should be the last time we went around on this.
The remainder may be rejected for unsuitability after careful
study. The ones which might get newly encoded as plain text
characters should *be* plain text characters.
The vendors who invented this icon set should continue to use
the PUA to exchange them. They are icons/signage and are
being exchanged and interpreted by humans as icons/signage.
Any machine interpretation of them should emulate what
people are doing. It's OK for there to be some overlap between
icons/signage and plain-text characters, after all, many of
those icons are pictures of those characters.
Standardizing an icon set in plain-text opens a door best
left closed.
Establishing a method to identify PUA schemes would enable
interpretation by any process which does that sort of thing
for much, much more than the emoji icon set.
(Of course, there is already a mark-up solution in place. Hint
to search engines everywhere desiring interpretation of PUA
code points: check the font(s) specified in the mark-up.)
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Sat Jan 03 2009 - 00:04:25 CST