L2/19-040

 

General category of U+FFFC

Eric Muller, Amazon

January 10, 2019

 

 

The character U+FFFC OBJECT REPLACEMENT CHARACTER is described in Unicode 11.0 as:

The U+FFFC OBJECT REPLACEMENT CHARACTER is used as an insertion point for objects located within a stream of text. All other information about the object is kept outside the character data stream. Internally it is a dummy character that acts as an anchor point for the object’s formatting information. In addition to assuring correct placement of an object in a data stream, the object replacement character allows the use of general stream-based algorithms for any textual aspects of embedded objects.

While this description does not exclude the replaced object from being a character, it is more likely to be an image, or some arbitrary text.

The general category of U+FFFC is So. Thus it is a graphic character (D50), a base character (D51), and can therefore be part of a combining character sequences (D56).

However, applying a combining mark to this character does not make much sense, given the wide range of “things” it can stand for.

The description of FFFC, in particular “acts as an anchor point”, makes this character very similar to the interlinear annotation characters, which are Cf.

For those reasons, we recommend that the gc property of this character be changed from So to Cf.

We do recognize that this change can be destabilizing. However, it is probably the case that applications that are sensitive to the general category or deal with grapheme clusters already have to handle this character specially (essentially, treat FFFC as a combining sequence/grapheme cluster on its own, and the following combining marks as a defective combining sequence/a separate grapheme cluster) and thus would not be affected; and those applications that ought to handle it specially would benefit from the change.