> From: Eric Muller [mailto:emuller@adobe.com]
> 
> "Ayers, Mike" wrote:
> 
> >  Currently, when sending email or
> > interpreting HTML, the content is tagged for its encoding.  
> Wouldn't PUA
> > users simply use their own tag (say, PUA-mike-1) instead of 
> UTF-8?  Am I
> > missing something?
> 
> What we are talking about is the character collection, not 
> the encoding of that
> collection. You really need two indicators, one which says 
> what the semantics of
> the character U+E000 is (as well as the other characters), 
> the other which says
> what byte sequence is used to encode this character (and the others).
        ???  Seems to me that the type tag (say, "UTF-8") specifies both the
character collection and its encoding (in UTR#17 terms, it specifies the
ACR, CCS, CEF, and CES, but not the TES).  Therefore, when I create (and
register with my system) a tag of "PUA-mike-1", I must have an interpreter
ready which can read Unicode over UTF-8 with the PUA interpreted as per my
specification.  Likewise, the existing applications interpret the tag
"UTF-8" to mean Unicode over UTF-8.
        Again I ask: why isn't content tagging considered appropriate for
PUA use?
> In fact, even without PUA characters, the problem is already 
> there. If my
> document is Unicode 3.0, U+03F4 is an error, if it's Unicode 
> 3.1, U+03F4 is
> GREEK CAPITAL TETHA SYMBOL.
        If your document was written according to Unicode 3.0 and contains
the non-character U+03f4, and it is interpreted by a Unicode 3.1
interpreter, than you've sent GREEK CAPITAL TETHA SYMBOL, whether or not you
wanted to.  This is really a separate issue from what I was discussing,
however - it is the industry wide practice of referring to a product name
and considering the latest version of the product to be its incarnation.
Such a method assumes mandatory upgrades.  I don't much care for this,
despite its benefits, but I'm not interested in fighting tsunami.
/|/|ike
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT