Unicode Collation Algorithm

From: Daniel Ehrenberg (microdan@gmail.com)
Date: Thu May 15 2008 - 15:41:27 CDT

Next message: Jonathan Pool: "Exemplifying apostrophes"

Previous message: Richard Wordingham: "Re: Prosgegrammeni"
Next in thread: Åke Persson: "Re: Unicode Collation Algorithm"
Reply: Åke Persson: "Re: Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi,

I'm trying to implement the Unicode Collation Algorithm, and I'm a
little confused by line 36099 of CollationTest_SHIFTED.txt. It is:

006C 00B7 0021; # (l·) LATIN SMALL LETTER L, MIDDLE DOT [1262 | 0020
01AF | 0002 0002 | FFFF FFFF 0258]

Here are the collation keys for the characters that it uses:

006C ; [.1262.0020.0002.006C] # LATIN SMALL LETTER L
00B7 ; [*0279.0020.0002.00B7] # MIDDLE DOT
0021 ; [*0258.0020.0002.0021] # EXCLAMATION MARK

All elements have combining class 0 and the string is already in NFD.

The asterisks indicate that an element is variable-weighted. Why,
then, in the key given, is U+00B7 treated as if it is not
variable-weighted? I'm treating variable weighted elements as shifted,
not non-ignorable, and as far as I can tell there's no way for a
variable-weighted element to not get shifted based on the context. So,
by my calculations, the actual collation key should be [1262 | 0020 |
0002 | FFFF 0279 0258]. This would make it precede the previous line
in sort order. Could somebody help me figure this out?

Dan Ehrenberg

Next message: Jonathan Pool: "Exemplifying apostrophes"
Previous message: Richard Wordingham: "Re: Prosgegrammeni"
Next in thread: Åke Persson: "Re: Unicode Collation Algorithm"
Reply: Åke Persson: "Re: Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu May 15 2008 - 15:46:13 CDT