From: Doug Ewell (doug@ewellic.org)
Date: Sat Jan 03 2009 - 11:37:02 CST
James Kass <thunder dash bird at earthlink dot net> wrote:
> Private Use Area just means user-defined area. There's nothing secret
> or damaging about user-defined characters, whether they be suitable
> potential candidates for standard plain-text, or whether they are
> destined to remain banished in the phantom zone for all eternity.
> There will always be people wishing or needing to exchange
> user-defined material, and there's nothing wrong with that. They are
> using the PUA correctly.
There seems to be a school of thought that private-use characters are
inherently evil and should never be used, except perhaps within one's
own personal system. The thinking seems to be that people will want to
search for these things and interoperability will be broken, and also
that "private agreement" implies a certain degree of secrecy and
extremely limited use.
It seemed, at once, obvious and brilliant to me, around the 1993 time
frame, that Unicode would provide a private-use area as part of its
overall strategy to encode the most commonly used characters, but not
just any old thing imaginable, so that users who wanted to use the
Unicode architecture to represent any old thing imaginable could encode
that thing as a private-use character. I was not familiar with the East
Asian encodings at the time and did not know that they also supported
this useful mechanism.
Over time, the principle of "most commonly used characters" in Unicode
expanded to include ancient scripts, musical symbols, and mathematical
font variants, as well as just about every Han character that someone
could dredge up instead of just the ones in existing standards. But the
PUA principle remained: you could still encode the Apple logo or Klingon
or Ewellic in the PUA, and reap the benefits of the Unicode architecture
without contaminating the Standard's repertoire.
At some point, perhaps with the rise of the Internet and powerful search
engines, the idea began to spread that using PUA characters was always
bad, because of the potential for conflict between different private
agreements -- as if that possibility had not occurred to anyone before.
I search for a document containing U+E690 and MegaFinder locates one for
me, but my interpretation of U+E690 might differ from the one used by
the author of the document. The private agreement is not transmitted
along with the document. Supposedly this will cause great
interoperability problems if I am not intelligent enough to understand
that this is the nature of private-use codes.
This school of thought has also carried over to the BCP 47 language
tagging arena, where people can create tags like "x-piglatin", whose
meaning should be obvious even without a written and signed "agreement,"
and can also create "qaa" or "x-abc123", whose meaning would be far from
obvious, and whose creator would have to be very naïve not to understand
this. Despite a serious lack of evidence that private-use tags are
causing a mainstream interoperability crisis, successive versions of BCP
47 have added more and more warnings against using them.
If you create an encoding standard of any sort, and include a
private-use mechanism as a defense against having to encode every
conceivable blob, and then turn around and discourage use of the
private-use mechanism, the natural conclusion is that you will feel
compelled to encode every conceivable blob.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sat Jan 03 2009 - 11:39:49 CST