replace variableTop with maxVariable (collation attributes)

For UCA/LDML collation, the variableTop attribute specifies which characters (via their primary weights) are “variable” when the alternate-handling is not non-ignorable.

Problem: The value for variableTop is a code point sequence. For common usage it requires knowledge of which character is the last in the Punctuation or Symbols or similar range, in DUCET order, which changes with every UCA version.

  1. Please add a new attribute “maxVariable” for easy, reliable setting of which characters are “variable”. Its set of values should be the special below-letter reorder_code values: space, punct, symbol, currency, digit. maxVariable=punct would be the CLDR/LDML default, while maxVariable=symbol would be the UCA DUCET default.
  1. I propose “kv” as the language tag key for “maxVariable”.
  1. Please deprecate the variableTop attribute, in favor of maxVariable.

(For more details see IcuBug:8032, especially comment #7)


For BCP 47 language tags, we intend to use the key "kv" and relevant values copied from kr.

I think the addition to source:trunk/common/bcp47/collation.xml would be something like

<key name="kv" description="Collation parameter key for maxVariable, the last reordering group to be affected by ka-shifted" since="23">
<type name="space" description="Only spaces are affected by ka-shifted"/>
<type name="punct" description="Spaces and punctuation are affected by ka-shifted (CLDR default)"/>
<type name="symbol" description="Spaces, punctuation and symbols except for currency symbols are affected by ka-shifted"/>
<type name="currency" description="Spaces, punctuation and all symbols are affected by ka-shifted"/>
<type name="digit" description="Spaces, punctuation, symbols, currency symbols and digits are affected by ka-shifted"/>

I omitted aliases; if they are not required, I would prefer not to introduce them.

For old-style CLDR/ICU locale IDs, I think we should use the same attribute and values, rather than introducing separate ones for that. For example, "de@kv=symbol".

We should also deprecate the tailorability of [last variable]/<last_variable/>, see LDML table "Specifying Last-Variable" in section 5.14.10 "Logical Reset Positions".

comment:4 Changed 3 years ago by Richard Wordingham <richard.wordingham@…>

Why do we need a *new* attribute? Can we not allow the value of variableTop to be a character sequence or special block name? With two attributes, we have the complexity of specifying which takes priority. With one attributes, it becomes part of the checking of syntax (or similar), which is already necessary.

Some second thoughts: I think no one would use maxVariable=currency or digit; I think we should initially limit the set of values to only three: space, punct, symbol.

The latest proposal to icu-design (2012-dec-10) was to have four values: space, punct, symbol, currency

ICU will pin the variable top to the next-higher maxVariable, and no higher than the highest maxVariable value (currency). Consider adding this to the LDML spec.

