collation tailoring using before

Richard Wordingham via CLDR-Users cldr-users at
Thu Aug 10 10:00:14 CDT 2017

On Wed, 9 Aug 2017 16:23:44 +0700
Martin Hosken via CLDR-Users <cldr-users at> wrote:

> I am trying to tailor (for the sake of argument) \u0300 to be primary
> ignorable and have a secondary collation key less than that of a
> primary character (a).
> I tried:
> &[before 2][first primary ignorable] << \u0300
> But then I get CEs of this form:
> a	[2900.0500.0500]
> \u0300	[0000.8000.0500]
> I'm wondering how I can get \u0300 [0000.0400.0500].

What your declared goal would result in is

a << á < áb << ab

The assumption is that no-one would want this, which is why the
collation is denigrated as ill-formed.  (Now DUCET is ill-formed,
though that's not why ICU doesn't support it.)

If what you want is

á << a < áb << ab

then the Pinyin collation provides an example:

                &[before 2]a<<ā<<<Ā<<á<<<Á<<ǎ<<<Ǎ<<à<<<À
                &[before 2]e<<ē<<<Ē<<é<<<É<<ě<<<Ě<<è<<<È
                &[before 2]i<<ī<<<Ī<<í<<<Í<<ǐ<<<Ǐ<<ì<<<Ì
                &[before 2]m<<m̄<<<M̄<<ḿ<<<Ḿ<<m̌<<<M̌<<m̀<<<M̀
                &[before 2]n<<n̄<<<N̄<<ń<<<Ń<<ň<<<Ň<<ǹ<<<Ǹ
                &[before 2]o<<ō<<<Ō<<ó<<<Ó<<ǒ<<<Ǒ<<ò<<<Ò
                &[before 2]u<<ū<<<Ū<<ú<<<Ú<<ǔ<<<Ǔ<<ù<<<Ù

This gives us

ā << a < āp << ap


More information about the CLDR-Users mailing list