From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Mar 11 2003 - 16:43:05 EST
Magda Danish (Unicode) wrote:
> > -----Original Message-----
> > From: Vladimir Ivanov [mailto:iranorus@online.ru]
> > It is clearly seen that there are letters on both sides of ZWNJ within the
> > word boundaries. Placing ZWNJ on an edge of the word doesn’t make sense in
> > Persian. From this point of view ZWNJ should be treated as a special
> > character rather than a delimiter.
The Unicode Collation Algorithm (UCA) for which allkeys.txt is the default weight table does treat
ZWNJ and a number of other characters as special. For these, they are completely ignored by the UCA
- same as if you stripped them from the text.
> > But in Allkeys Table it is placed on line #68 well before other popular
> > delimiters: HORIZONTAL TABULATION line #192,
The order of entries in allkeys is irrelevant; what is relevant is the assignment of weights, and
ZWNJ gets all-zero weights. You need to implement the algorithm, not just the relative order of
entries in the file. (allkeys does sort its entries by shifted, multi-level weights, but order for
same-weight characters does not matter.)
> > I’ve sold this problem for myself by placing ZWNJ somewhere after
> > delimiters, but what are the theoretical reasons for putting
> > it before them?
> > In order to get what? In what languages?
"Before" is wrong, see above. Think of ZWNJ as "not there" for UCA.
> > By the way, the sorting algorithm built into MS Windows puts compound words
> > with ZWNJ AFTER their simple components. So in this respect it acts on the
> > principles different from Allkeys Table.
Windows does not implement the Unicode Collation Algorithm, as far as I know.
Best regards,
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Tue Mar 11 2003 - 17:32:47 EST