From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Wed Mar 12 2003 - 10:37:57 EST
On Tue, 11 Mar 2003, Markus Scherer wrote:
> The Unicode Collation Algorithm (UCA) for which allkeys.txt is the
> default weight table does treat ZWNJ and a number of other characters as
> special. For these, they are completely ignored by the UCA - same as if
> you stripped them from the text.
Well, anything that is completely ignored in collation creates problems
with deterministic sorting. There are certain words in Persian, with
completely different meanings, that only differ in a ZWNJ[1]. Having ZWNJ
ignored by default, means they may appear in this or that order, possibly
based on the original order of input. I guess this is not what we want
for deterministic collation.
The desired behavior for ZWNJ, is being treated like punctuations.
Ignored in the first levels, but considered at the end. (Personal Note:
write something for UTC on this.)
roozbeh
[1] A good example, is نامهای or نامهای (names of) vs
نامهای (a letter). Their only difference in encoding is
existence or non-existence of ZWNJs, or its different place in the word.
This archive was generated by hypermail 2.1.5 : Wed Mar 12 2003 - 11:17:59 EST