RE: StandardizedVariants.txt error?

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Mon, 26 Nov 2012 21:53:42 +0000

Actually, I think the omission here is the word "canonical". In other words, Section 16.4 should probably read:

"The base character in a variation sequence is never a combining character or a *canonical* decomposable character."

Note that with this addition, StandardizedVariants.txt poses no contradiction, because all of the decomposable character instances noted are compatibility decomposable characters.

The main concern here with this restriction is to ensure that one doesn't end up with conundrums involving canonical decompositions into sequences followed by a variation selector.

In the case of compatibility decompositions, there already is no expectation that neither the appearance nor the interpretation of the text will change. With a decomposition mapping like "<font> 0069", the decomposition is already indicating a typically different appearance. If you decompose U+2139 to U+0069, you have already lost information about appearance and interpretation. So it isn't that much of a stretch to assume that any relevant variation sequences will also lose their interpretation.

But I think it might make sense, in addition to the above textual fix, to add a note to the standard to indicate that variation sequences preserve their validity across *canonical* normalization forms, but that there is no guarantee that variation sequences will remain valid for any compatibility normalization.

--Ken

> 2012-11-24 8:12, Masatoshi Kimura wrote:
>
> > According to TUS v6.2 clause 16.4,
> > http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf#page=15
> >> The base character in a variation sequence is never a
> >> combining character or a decomposable character.
> > However, the following base characters appearing in
> > http://unicode.org/Public/6.2.0/ucd/StandardizedVariants.txt
> > have a decomposition mapping.
>
> There seems to be a contradiction here. “Decomposable character” is
> defined in clause 3.7 as follows:
>
> “A character that is equivalent to a sequence of one or more other
> characters, according to the decomposition mappings found in the Unicode
> Character Database, and those described in Section 3.12, Conjoining Jamo
> Behavior.”
>
> I suppose the intended meaning in clause 16.4, given its context, is to
> say that the base character is neither a combining character nor a
> character with a decomposition that contains a combining character.
>
> Yucca
>
>
Received on Mon Nov 26 2012 - 15:57:08 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 26 2012 - 15:57:09 CST