From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Mar 12 2003 - 11:48:54 EST
Roozbeh Pournader wrote:
> Well, anything that is completely ignored in collation creates problems
> with deterministic sorting.
I don't think you mean "deterministic". UCA is deterministic, it just sorts many strings as equal.
> There are certain words in Persian, with
> completely different meanings, that only differ in a ZWNJ[1]. Having ZWNJ
> ignored by default, means they may appear in this or that order, possibly
> based on the original order of input. I guess this is not what we want
> for deterministic collation.
>
> The desired behavior for ZWNJ, is being treated like punctuations.
> Ignored in the first levels, but considered at the end. (Personal Note:
> write something for UTC on this.)
Possible. I assume that ZWNJ is ignored in UCA because that is the expected behavior for many other
languages. Not ignoring ZWNJ is possible with a tailoring that gives it some non-zero weights.
Note that many languages require tailorings for at least a couple of characters to follow national
standards.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Wed Mar 12 2003 - 12:40:49 EST