Here is a system that I think would work.
Consider please that there exists for the private use area the concept of
the hexadecimal point. The term "hexadecimal point" is similar to the
concept of a decimal point, the difference being that a decimal point is for
base 10 numbers and a hexadecimal point is for base 16 numbers.
An agreed system could regard all characters that are defined within it to
have a code point value that is a real number, consisting of a part to the
left of the hexadecimal point that is a value from the private use area
range of code points, and a part to the right of the hexadecimal point that
is a value that is assigned as part of the method of registering the
characters as being included in the agreed system. So, for example, if a
set of characters for some particular script P is registered, it might be
registered as having a part to the right of the hexadecimal point of, say,
1005 so that if the characters were placed at U+E000 through to U+E0FF then
they would be regarded within the agreed system as being at A+E000.1005
through to A+E0FF.1005 at integer spaced intervals.
Four hexadecimal places would seem to be a good balance between having scope
and avoiding complexity, with .0 being unused for allocating to characters
and having a meaning of "undefined".
One possibility for the agreed system would be to use U+E000 through to
U+EFFF for defining blocks of up to 4096 characters and then using two
characters from the range U+F000 through to U+F8FF to mean the start and the
end of defining the part to the right of the hexadecimal point. I am aware
at the back of my mind that some of the characters in the range U+F000
through to U+F8FF are often used for a particular type of user defined fount
such as dingbat type things, so I wonder if someone could please say if they
know of that matter so that any suggestions for defining these start and end
of defining codes does not clash with that usage. Indeed that usage could
be included into the agreed system and codes starting with a D to the right
of the hexadecimal point could be allocated to them for permanent use. For
example, some particular such fount Q might be designated to have, say,
D157 to the right of the hexadecimal point.
Suppose though, on a temporary basis herein pending resolution of that
matter, that within the agreed system U+F000 were to mean the start of
defining the part to the right of the hexadecimal point and U+F001 were to
mean the end of defining the part to the right of the hexadecimal point,
then a plain text file could indicate, for uses involving use of the agreed
system, the use of the particular script P mentioned above using the
following sequence of characters.
U+F000 U+0031 U+0030 U+0030 U+0035 U+F001
All characters in the private use area would be presumed to have that part
to the right of the hexadecimal point until another sequence starting U+F000
were received.
The use of this hexadecimal point technique would allow characters from
several different character sets to be used in the same plain text file.
The agreed system could also include codes for characters that are never
intended to become standard Unicode characters yet for which a universal
designation would be helpful. These character sets could use designations
starting with some character such as C to the right of the hexadecimal
point.
Although phrased as a part to the right of the hexadecimal point, these
agreed system codes are really designations of trays of character designs;
however, the use of the hexadecimal point is convenient for expressing an
individual character in the form, for example, of A+E023.1005 so that its
meaning is uniquely defined within the system.
Also, by using a part to the right of the hexadecimal point, the system has
unlimited scope.
The matter of keeping the system up to date could be partly resolved by, for
scripts that are under consideration for inclusion in Unicode, a time out of
validity of a year and a day after the publication of some particular
version number of the Unicode specification. If necessary the time out
could be extended if a decision about whether to include the characters in
Unicode has not been reached by the time of that version of Unicode being
published, or, if necessary the validity could be made permanent if the
decision is not to include the characters in Unicode.
I feel that such an agreed system would be very helpful and potentially of
great usefulness.
The next matter is as to what is meant by agreed in the phrase agreed
system.
I feel that if the matter is discussed here in this discussion forum then
whatever consensus exists when the discussion hopefully reaches a consensus
could be taken as the agreed system. Please know that although the phrase
"private agreement" is used in the specification in the section about the
private use area, later in that section the word "published" is used, so one
does not, in fact, need any agreement at all, it is quite permissible to
simply publish one's own suggested system. Naturally, the more agreement
amongst those people who express an interest that one can achieve the better
that is, yet I feel that the best way forward is to discuss a system and
then proceed by taking on board such comments that are received that can be
accommodated in the system and then publishing a system and starting to use
that system and then anyone who so wishes may participate in the use of that
published system.
William Overington
13 March 2002
This archive was generated by hypermail 2.1.2 : Wed Mar 13 2002 - 06:55:44 EST