Tags and the Private Use Area (derives from On the possibility of guidance code points for the Private Use Area)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Wed Apr 25 2001 - 06:45:12 EDT

Tim Partridge and Marco Cimarosti both suggested the use of a new plane 14
tag character.

Marco wrote:

The wild idea is to add a tag prefix for specifying "PUA semantics" in plain


This prefix would be followed by a sequence of tag characters
(U-0E0020..U-0E007F) that specifies the meaning of the PUA characters used
from that point onwards.

end quote

Other contributors have pointed out various problems over this approach.

I therefore put forward the following suggestion.

Let there exist the idea that there is U+uvwx02 (PUA INTERPREATATION TAG)
and a set of private use area tag characters (U+uvwx20 .. U+uvwx7F) all of
which code points are in the upper private use area.

The specific place where those private use area tag characters are located
is open for discussion, though I am hopeful that a specific consensus could
emerge fairly easily so that the idea could progress without ambiguity. May
I suggest that mention is made that, where displayed for analysis purposes,
these private use area tags should be displayed as yellow on a red
background. Ordinary unicode tags displayed for analysis are not specified
to be displayed in any specific colour but some people might like to display
them as white on blue so as not to conflict visually with private use area

Naturally any such definition within the private use area would not be an
absolute definition and the Unicode Consortium is not being asked to endorse
it nor would they, by their own statement. All that could be reasonably
sought is that the practice and such protocols that are expressed using such
private use area tags are so well thought out and designed by interested
users that most users will wish to use them for most applications. It
cannot be expected that most users will agree to such a system, yet one can
always hope. We are at a very early stage with the upper private use area
and hopefully a quality system worked out now will be widely valued.

Specific protocols to use with such tagging can then be devised.

The choice of uvwx as the place to locate these user agreed private use area
tags would need to be agreed by those users who are interested in such a

A factor worth considering is that very often these private use area tags
will be represented using a pair of surrogate codes. The private-use high
surrogate code values are from U+DB80 to U+DBFF. Thus one of these codes
would always be the high order surrogate code value for private use area
tags when private use area tags were represented using a surrogate pair. It
might be helpful to consider what the effect would be if such a file of 16
bit unicode were displayed using an ordinary ascii file analyser program,
that is, the message of the tags would show and before each character would
be some other characters.

Another factor worth considering is that uvwx could be chosen to be as
similar to 0E00 as reasonably possible within the private use area, to give
an aesthetic symmetry.

Various other factors could also be considered such as trying to place this
suggested user agreed usage so as not to impede large symmetrical usage of
the upper private usage area for other purposes for people who would like to
use the upper private usage area in a symmetrical manner while
simultaneously agreeing to use the private use area tag idea.

William Overington

25 April 2001

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT