RE: On the possibility of guidance code points for the Private Us e Area

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Wed Apr 25 2001 - 11:48:30 EDT

Next message: Elliotte Rusty Harold: "Re: Decimal Unicodepoints"
Previous message: William Overington: "Tags and the Private Use Area (derives from On the possibility of guidance code points for the Private Use Area)"
Maybe in reply to: Marco Cimarosti: "RE: On the possibility of guidance code points for the Private Us e Area"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> From: Eric Muller [mailto:emuller@adobe.com]
>
> "Ayers, Mike" wrote:
>
> > Currently, when sending email or
> > interpreting HTML, the content is tagged for its encoding.
> Wouldn't PUA
> > users simply use their own tag (say, PUA-mike-1) instead of
> UTF-8? Am I
> > missing something?
>
> What we are talking about is the character collection, not
> the encoding of that
> collection. You really need two indicators, one which says
> what the semantics of
> the character U+E000 is (as well as the other characters),
> the other which says
> what byte sequence is used to encode this character (and the others).

??? Seems to me that the type tag (say, "UTF-8") specifies both the
character collection and its encoding (in UTR#17 terms, it specifies the
ACR, CCS, CEF, and CES, but not the TES). Therefore, when I create (and
register with my system) a tag of "PUA-mike-1", I must have an interpreter
ready which can read Unicode over UTF-8 with the PUA interpreted as per my
specification. Likewise, the existing applications interpret the tag
"UTF-8" to mean Unicode over UTF-8.

Again I ask: why isn't content tagging considered appropriate for
PUA use?

> In fact, even without PUA characters, the problem is already
> there. If my
> document is Unicode 3.0, U+03F4 is an error, if it's Unicode
> 3.1, U+03F4 is
> GREEK CAPITAL TETHA SYMBOL.

If your document was written according to Unicode 3.0 and contains
the non-character U+03f4, and it is interpreted by a Unicode 3.1
interpreter, than you've sent GREEK CAPITAL TETHA SYMBOL, whether or not you
wanted to. This is really a separate issue from what I was discussing,
however - it is the industry wide practice of referring to a product name
and considering the latest version of the product to be its incarnation.
Such a method assumes mandatory upgrades. I don't much care for this,
despite its benefits, but I'm not interested in fighting tsunami.

/|/|ike

Next message: Elliotte Rusty Harold: "Re: Decimal Unicodepoints"
Previous message: William Overington: "Tags and the Private Use Area (derives from On the possibility of guidance code points for the Private Use Area)"
Maybe in reply to: Marco Cimarosti: "RE: On the possibility of guidance code points for the Private Us e Area"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT