Richard Wordingham
Tue Dec 15 18:01:05 CST 2015


srivas sinnathurai wrote:

> Does the standard support the use of diacritics in plain text format,
> when used with all and any complex scripts?

Relatively few scalar value sequences are prohibited - just possibly
sequences containing unassigned characters that are not
non-characters, but I can't think of any others.  (The
prohibition on unpaired surrogates applies to coded character
sequences, but surrogate characters aren't scalar values.) 

It would appear by Conformance Requirement C5, 'A process shall not
assume that it is required to interpret any particular coded character
sequence', that a process is at liberty to decline to interpret a
sequence of scalar values, even if it has just interpreted it.

I am not aware of any requirements in the standard to interpret
specific character sequences.

In general, the interpretation of character sequences is undefined.
For example, a request for advice on the interpretation of
the combination of U+0331 COMBINING MACRON BELOW and U+0E39 THAI
CHARACTER SARA UU was answered with the instruction to consult the
non-existent typographical tradition.  It's been left to rendering
engine writers to define the interpretation.

Indeed, I am not sure that every sequence of defined scalar values
has an interpretation.  Most pairs of regional indicators don't have an
interpretation, and the interpretation of each variation sequences may
change at least twice, once when the base character becomes defined
(or is defined not to be a possible base character), and again when
the variation sequence is assigned an interpretation as an ill-defined
(or grossly ill-defined) family of glyphs.

SOLIDUS OVERLAY have a defined interpretation when their base character
is to be represented by a mirrored glyph.  Note that in general, the
Unicode standard does not define when a character is to be represented
by a mirrored glyph.  This may be defined by a lower level protocol
(the font file).


