L2/00-157
From: Karlsson Kent - keka [keka@im.se]
Sent: Tuesday, April 25, 2000 12:54 PM
To: 'mark.davis@us.ibm.com'
Cc: 'Kenneth Whistler'
Subject: RE: proposed changes in UTR#10: Collation
I see no reason for UTR 10 to have a very different notion of the comparisons defined, than those in 14651.
Suggested modified text:
The first weight
is called the Level 1 weight (or primary weight), the second is
called the Level 2 weight (secondary weight), and the third is
called the Level 3 weight (tertiary weight). For a collation
element X, these can be abbreviated as X1, X2, and X3.
Given two collation elements X and Y, we will use the following notation:
Notation |
Reading |
Meaning |
|
X =0 Y |
|
true |
|
X =1 Y |
X is primary equal to Y |
X =0 Y and X1
= Y1 |
i.e. X1 = Y1 |
X =2 Y |
X is secondary equal to
Y |
X =1 Y and X2
= Y2 |
|
X =3 Y |
X is tertiary equal to
Y |
X =2 Y and X3
= Y3 |
|
X =4 Y |
X is quarternary equal
to Y |
X =3 Y and X4
= Y4 |
Notation |
Reading |
Meaning |
|
X <0 Y |
|
false |
|
X <1 Y |
X is primary less than
Y |
X <0 Y or
(X =0 Y and X1 < Y1) |
i.e. X1 <
Y1 |
X <2 Y |
X is secondary less
than Y |
X <1 Y or
(X =1 Y and X2 < Y2) |
|
X <3 Y |
X is tertiary less than
Y |
X <2 Y or
(X =2 Y and X3 < Y3) |
|
X <4 Y |
X is quarternary less
than Y |
X <3 Y or
(X =3 Y and X4 < Y4) |
The collation
algorithm results in a similar ordering among characters and strings, so that
for two strings A and B we can write A <2 B, meaning that A is less
than B and there is a secondary or primary difference between them. If A <2
B, but A =1 B, we say that there is only a secondary
difference between them (which, however, implies that there is also a tertiary
difference between them). If two strings have no primary, secondary or tertiary
difference according to a given Collation Table, then we write A =3
B. If two strings are equivalent (equal at all levels) according to a given
Collation Table, we write A = B. If they are bit-for-bit identical, we write A
== B.
This makes all the orders defined total, and avoids the
(incomplete) partial orders you defined before. This way one defines the orders
that users are likely to be interested in, and the orders given by (e.g.) the
Java collation API.
Kind regards
/kent k
=============================================================
Second
message from Kent:
> Old
> <version> :=
<major>.<minor>.<variant> <eol>
> New
> @<version> :=
<major>.<minor>.<variant> <eol>
Do you mean:
<version> :=
@<major>.<minor>.<variant> <eol>
> 2. To allow for POSIX-style positions:
> · Change the term
Shifted to ShiftLow throughout the document
> · Add ShiftHigh
definition and examples.
> · ShiftHigh: The
same as ShiftLow, except that all non-variable collation elements get
> a fourth-level weight equal to 0001.
That, however, is not how the POSIX “,position”
option works. (But it seems
that the major proponents of “,position” don’t
know how it works either...)
The following text, from 14651, does describe
how “,position” works, given the
informal descriptions given by the proponents of
“,position”:
:Subkeys, at the last level, formed with the
“forward,position” level
:processing parameter are formed by forming a
subkey as with the “forward”
:parameter, but for collating elements that are
not "IGNORE"d at all levels
:but the last one, their last level weighting
(list of weights) is replaced
:by a single weight (call it <PLAIN> here)
that is larger than all other
:weights at the last level in the given tailored
table. Collating elements
:that are "IGNORE"d at all levels but
the last one, retain their weighting
:according to the given tailored table. Finally,
any trailing sequence of
:the maximal weight (<PLAIN>) is removed
from the subkey, effectively
:replacing each trailing maximal weight with a
zero weight.
Note that <PLAIN> is FFFF in UTR10.
So in essence, and from a UTR10
perspective, ",position" is the same
as "Shifted", but with the added
twist of removing any trailing sequence of FFFF
weights.
Rather than 1) make a false statement about
“,position” operation (like your
suggestion), or 2) make a correct statement
about “,position” (like what 14651
says), I’d prefer 3) forget about “,position”,
since it does not bring any
tangible advantages, and is frequently
misinterpreted. Support for it is
NOT required by 14651, and when it is not
supported but asked for (if that
is possible in the syntax used) it is to be
interpreted in the same way
as asking for "forward".
Kind
regards
/kent
k