Re: UCA tertiary weight assignment vs. decomposition type definition in Unicode character database

From: Ken Whistler <kenw_at_sybase.com>
Date: Fri, 27 Jan 2012 13:51:53 -0800

On 1/27/2012 1:16 PM, Matt Ma wrote:
> Hi,
>
> There are a few characters having no decomposition type defined in
> UnicodeData.txt, but they were assigned tertiary weight in
> allkeys.text as if the characters had decomposition type. Here are a
> few examples (version 6.0.0),
>
> ...

> U+A733, U+A732, U+1F1E6 were given tertiary weight as they were
> <compat>, while U+31B4 as it were<final>.

Yep, that is all done deliberately, to make the default sorting a bit
more consistent.
The normative decompositions in UnicodeData.txt are only the starting point
for attempting to give more consistent default weights for collation.

>
> Is this something documented outside of UCA?

No, because it is only relevant *to* UCA. At least as far as documentation
written by the UTC is concerned.

Well, I suppose it is also relevant to CLDR, because CLDR bases its
collation
tables on a tailoring of allkeys.txt from UCA. I don't know what
documentation
there may or may not be about the default treatment for tertiary weights
in CLDR. Somebody involved in the details of CLDR collation will have
to answer that one.

--Ken
Received on Fri Jan 27 2012 - 15:55:37 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 27 2012 - 15:55:39 CST