Re: Proposed Update UAXes for Unicode 6.1

From: Ken Whistler <kenw_at_sybase.com>
Date: Fri, 08 Jul 2011 11:33:34 -0700

On 7/8/2011 10:26 AM, Philippe Verdy wrote:
> This is not related strictly related to this Unicode version update,
> but I have an interesting question about the Unicode Stability Policy.
>
> Summary: How does it apply to the exact value (or aliases) of the
> property "Decomposition Type" (dt), for compatibility decomposition
> mappings ?

It doesn't. To date there is no stability policy which would prevent
changing
existing decomposition types (other than Canonical or None, obviously) from
one of the compatibility types to another, or even introducing new
compatibility Decomposition_Type values.

That said, I consider it quite unlikely that the UTC at this point would
want to
entertain changing those values or introducing new ones.

> I've looked closely in the definition of other derived properties, and
> it does not seem that the "dt" property is used for anything else than
> implementing the normalizations (for example the word-breaking
> properties do not depend on "dt=nb").

Correct. But that does *not* mean that it isn't used by other standards.
In particular, Decomposition_Type is deeply baked into the default
tertiary weights given to collation elements by the Unicode Collation
Algorithm. See:

http://www.unicode.org/reports/tr10/#Tertiary_Weight_Table

Any attempt to start fiddling with compatibility Decomposition_Type values
in the UCD would have the potential to rather pervasively disturb the
construction of weights for the UCA DUCET table. Keeping it stable in
such a case would require introducing an arbitrary set of counter-property
changes to, in a sense, back out any changes made to the Decomposition_Type
itself.

I can confidently predict that rather than go there, any attempt to start
"fixing" Decomposition_Type values would most likely simply be met
with the introduction of a stability policy that would apply to the
compatibility
Decomposition_Type values as well.

> And it may eventually be convenient to have some characters with
> compatibility decomposition mappings changed to exhibit better
> decomposition mapping types

Any such attempt to do a better job of classifying decomposition types needs
to be done essentially out-of-band, as a separate effort, unrelated to the
UCD Decomposition_Type values. In my opinion, such an effort would be,
however, of limited value, anyway, because the set of characters
with compatibility decompositions is such an arbitrary collection of legacy
dreck inherited from character encodings created for all different kinds of
purposes. And the historical process whereby some of those characters
got compatibility decompositions and some did not was itself rather
arbitrary, and done early on, before the design of the Unicode Normalization
Algorithm and of course without knowledge of many other things that have
happened to the standard during the past 20 years.

--Ken
Received on Fri Jul 08 2011 - 13:36:24 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 08 2011 - 13:36:25 CDT