L2/00-034

 

 

Title: In-process Private Use Character

Source: Microsoft (expert contribution)

Date: February 2nd 2000

 

For a long time we have had a need in the Unicode space of transient code positions used as anchors for higher-level protocols. Some of them have been officially encoded: the Object Replacement Character (U+FFFC) and the Interlinear Annotation Characters (U+FFF9, U+FFFA and U+FFFB). Typically these characters are only used temporarily by internal processing and should not be exported in transmitted data. Although the intent of the submitters was made clear, the adoption of these characters created a lot of discussion, as it was feared that the semantic of these characters could permeate into exchanged documents and create confusion in other character formats by creating alternate notation to their own formatting information. A good example is the perceived intersection of the Interlinear Annotation Characters that may be used for internal processing of Ruby annotations and the proposed RUBY element into XHTML.

 

At the same time some implementers are in need of additional transient characters. Given the controversy already created by the previously approved characters, it is probably wise to treat them as private use characters, therefore without implied semantics. There are really two solutions:

 

·         reserve a specific area of the current Private Use Area for these in-process character codes. However the PUA is currently in use for private use among end users and can be used in its entirety for that purpose. Therefore restricting some of the space would create a backward compatibility issue. These in-process codes are in a different usage layer but would intersect with current usage.

 

·         create a new small private use area reserved for these new in process code.

 

The author is proposing to reserve the space between U+FFF0 and U+FFF8, and an additional 32 characters range in U+FE00-FE1F.

 

Similarly to the characters U+FFF9-FFFC, these additional characters may be ignored and/or filtered out by higher level protocol as their usage is always transient and reserved to in-process mode.

 

Michel Suignard

Microsoft

-------------