Re: Private Use Agreements and Unapproved Characters

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Mar 12 2002 - 23:41:49 EST


Back to Patrick's original question. Warning: this post contains
nothing about Klingon, or even Tengwar.

Patrick Rourke <ptrourke@methymna.com> wrote:

> One effect of Unicode Consortium's rigorous proposal/review policy is
that
> while a particular script or group of characters may not be adopted
into
> Unicode for a couple of years after it is proposed, font makers
usually
> don't get around to creating the fonts for those scripts until after
they
> have been officially approved for Unicode.

There's no reason it has to be that way. Proposed glyphs are posted on
the Unicode Web site months in advance of their "go live" date, even
before the beta period, largely for this reason. I'm sure Unicode-aware
type designers like John Hudson don't wait until a version of Unicode is
formally released before they start designing glyphs.

> Would it be a misuse of the PUA to come up with a private agreement
within a
> community to assign certain codepoints in the PUA to characters that
have
> been propsed to the Unicode Consortium, but not yet approved, so that
font
> designers and others in that community could get to work on
establishing
> support for these characters, and so that content providers can begin
the
> process of incorporating these characters into their content?

As some have already said, this is exactly what the PUA is for. But the
size and scope of the "community" may impose limits on the utility of
these PUA assignments. Certainly not all font designers and content
providers for a given non-Unicode script, worldwide, can be expected to
comply -- and if they do, it may cause another set of problems, as we
will see.

One important point to remember is that any use or proposed use of the
PUA, such as ConScript, is strictly up to private organizations, not the
Unicode Consortium. To be sure, ConScript is the domain of two guys who
are quite influential in Unicode, but they do not maintain ConScript in
any official capacity as representatives of Unicode.

> Would it be
> useful/practial for such an agreement to stipulate a versioning system
> whereby the font creators &c. and content providers in that community
who
> wish to use the PUA mapping in question would have to release new
versions
> of their products with the characters remapped to the approved
codepoints
> upon the acceptance of the characters in Unicode (and with the PUA
> codepoints being obsolesced, and eventually removed, in subsequent
versions
> of the agreement assignments, until all characters were assigned by
the
> Unicode Consortium)?

I would think you could simply use the version number of the Unicode
Standard. For example, the use of Tagalog would have been conformant to
this proposed PUA registry until Unicode version 3.2, at which time it
would have to be removed from the registry because of its introduction
into Unicode.

> This would I think considerably shorten the amount of
> time it would take for characters to become usable to a community
after they
> had been accepted into Unicode, and would also provide a mechanism for
the
> gradual introduction of "new" characters, while the versioning system
would
> (I'd hope) prevent PUA code points from being used long after
perfectly good
> permenent code points have been assigned.

Conformance to this registry, especially over a period of time, is up to
the user community. The presence of a standard is no guarantee that it
will be followed, or even noticed.

Here's an example of a potential pitfall of widespread PUA
quasi-standardization. John Jenkins has probably done more than anyone
to get the Deseret Alphabet encoded in Unicode (although it is never
wise to overlook Michael Everson's influence). John has a series of Web
pages describing Unicode and the DA. To this day, the main page at
<http://homepage.mac.com/jenkins/Deseret/Unicode.html> still includes
the following quote, in large bold italics:

    "It is strongly recommended that any implementations of the Deseret
Alphabet conform to the ConScript encoding, if possible."

Now, I don't bring this up to point out that John isn't keeping his Web
pages up to date, but to show that this is and will continue to be a
widespread problem, on the Web and elsewhere, even among the most
diligent supporters of a script and of Unicode.

Suppose Old Persian Cuneiform is encoded in Patrick's PUA registry next
week, and that encoding achieves some popularity. Then suppose at some
later date it is encoded in Unicode, say version 4.1. This will
necessarily cause the encoding in Patrick's registry to be withdrawn, or
at least deprecated. How many people will switch immediately to the
sanctioned Unicode encoding? How quickly will existing software and
data be converted? Probably not right away, and the chances for a
timely conversion are less if the private-use encoding is particularly
successful, whether or not there are scripts available to help people
make the conversion.

I provided a "Format A" conversion table to map Deseret characters from
the old ConScript encoding to the code positions introduced in Unicode
3.1, and another to map Shavian to its proposed Unicode code points.
You can see them at the ConScript site,
<http://www.evertype.com/standards/csur/index.html>. Whether anyone has
ever used these tables, or will ever notice them, is another matter
entirely.

> The main issue I can think of is the matter of rejected characters:
what
> does one do if a character is rejected by the Unicode Consortium for
valid
> reasons? Delete it from the agreement, and have to remove a
distinction
> from the character data of the content providers? Leave it there, and
so
> perpetuate some final version of the agreement for all time, as a kind
of
> extension to Unicode?

This is exactly the reason for the "rigorous proposal/review policy"
mentioned earlier, and perhaps the biggest drawback to the concept of a
widespread PUA encoding for future Unicode scripts. It usually does
take a while to get characters encoded in Unicode, not just because
committees are big and slow and bureaucratic, but because there are real
decisions to be made that can take a lot of time and research. Rushing
these characters into use before Unicode and WG2 have finished making
these decisions could subvert the process and create the dilemmas
Patrick mentioned.

Sorry to be so negative.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Wed Mar 13 2002 - 00:01:21 EST