From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Jan 03 2009 - 05:49:25 CST
On 1/2/2009 10:00 PM, James Kass wrote:
> Peter Constable replied,
>
>
>>>> These are getting interchanged publicly between
>>>> different vendors products. That's not private use.
>>>>
>>> Semantics. There is no point to user defined characters if
>>> they can't be exchanged. There is even at least one well-known
>>> PUA registry.
>>>
>> I don't mean just communicated between different vendors'
>> processes, but also interpreted and processed by different
>> vendors' processes, in contexts where no private agreement
>> can be assumed.
>>
>
> The existence of a private agreement is a given, otherwise
> neither interpretation nor processing would be desired. In
> contexts where the nature of the private agreement cannot
> be determined, no interpretation is possible. Processing can
> be done on uninterpreted strings. I don't need to be able to
> speak Hindi in order to enter, store, search, and collate text
> written in Devanagari, and neither does my plain-text editor.
>
But your plain text database on your web server cannot present Hindi
words in the order a user of your website in India would expect them,
unless the text (that is its character codes) can be interpreted.
> Success in interpreting the text, then, lies in determining the
> nature of the private agreement. This is not a new concept,
> it has been discussed here previously, unless I'm mistaken.
> Mark-up was one method mentioned, if I recall correctly.
> Search engines can interpret mark-up.
>
If that was as easy and straightforward, we wouldn't have a Unicode
Standard.
If I remember correctly, before Unicode, everybody had their own
character sets, and in Japan, every vendor had their own. In order to
communicate you had to know what character set the other party was
using. ISO 2022 even had internal markup (control sequences) to allow
switching of character sets on the fly.
Interestingly enough, vendors, users and implementers voted with their
feet to abandon such systems and go to a unified encoding where the
semantics of each code are unambiguous on the character level, where
there's no need to switch on the fly, and where the processes can be
written without undue complication.
>
>> If text content is getting generated in (say)
>> DoCoMo text protocols, spreading into other content via
>> other protocols and then that content is getting interpreted
>> by processes produced by Google or Apple or whomever,
>> than the sense in UTC (I think I can say) is going to be that
>> that is *public* interchange, hence presenting a case for
>> being representable in the UCS.
>>
>
> Public interchange of private characters, which happens all
> the time, is a good indicator that a case might be made for
> plain-text encoding.
Except for the aside, I'm in agreement. (I would instead say: "which
ideally shouldn't happen, except in carefully controlled, closed
environments")
> Suitability again, opinions may vary,
> members vote. (I'm trying to rephrase and expand on what
> you said to see if we're basically agreeing here.)
>
Suitability requirements are different between ordinary and
compatibility characters - that's a long held design principle for the
Unicode Standard.
>
>>> Quite so. Refusing to encode these would be the best
>>> tactic to keep others from using the PUA to "promote"
>>> their thingies into regular Unicode.
>>>
>> By that line of argumentation, we could completely
>> freeze encoding of any new characters as a tactic to
>> keep others from inventing new characters that might
>> need to be encoded -- sure, we could do that; but that
>> doesn't mean we *should* on that basis.
>>
>
> We shouldn't exclude text-like characters from being included
> in a plain-text encoding standard as long as all the criteria are
> met.
Criteria for encoding are different between ordinary and compatibility
characters. Requiring that the criteria for ordinary character are to be
met, is tantamount to freezing all encoding of compatibility characters.
That's not a useful starting point.
Are there characters that are unsuitable to even be compatibility
characters? As a theoretical point: yes. As for the current set:
opinions may vary, but it's not a black or white case.
> "Thingies" might have been a poor choice of words on
> my part. To rephrase, refusing to encode this set of proprietary
> random icons en-masse would prevent others from trying to
> get their icon sets (or whatever) promoted.
>
> Not to say that some of the underlying symbols which some
> of the icons represent shouldn't be encoded, many of them
> already are. Michael Everson pointed out some which weren't
> and probably should be the last time we went around on this.
> The remainder may be rejected for unsuitability after careful
> study. The ones which might get newly encoded as plain text
> characters should *be* plain text characters.
>
Call them "ordinary" characters for lack of a better term.
But the ones that are not ordinary characters are not immediately out of
consideration. You need to triage these further and make a careful
deliberation whether they qualify (or not) as compatibility characters.
> The vendors who invented this icon set should continue to use
> the PUA to exchange them. They are icons/signage and are
> being exchanged and interpreted by humans as icons/signage.
> Any machine interpretation of them should emulate what
> people are doing. It's OK for there to be some overlap between
> icons/signage and plain-text characters, after all, many of
> those icons are pictures of those characters.
>
This sounds like you are confusing the emoticon and the emoji discussion.
> Standardizing an icon set in plain-text opens a door best
> left closed.
>
> Establishing a method to identify PUA schemes would enable
> interpretation by any process which does that sort of thing
> for much, much more than the emoji icon set.
>
> (Of course, there is already a mark-up solution in place. Hint
> to search engines everywhere desiring interpretation of PUA
> code points: check the font(s) specified in the mark-up.)
>
The fact that the request to provide a solution using non-PUA character
codes is so strongly supported by leading search engine manufacturer(s)
should give you pause here.
A./
This archive was generated by hypermail 2.1.5 : Sat Jan 03 2009 - 05:52:14 CST