From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Tue Jul 29 2003 - 09:11:25 EDT
On 28/07/2003 19:05, Kenneth Whistler wrote:
> ...
>
>This is, of course, precisely the desired result -- the CGJ is
>ignored for weighting, but its presence prevents the reordering
>of the vowels into the undesired sequence by normalization.
>And the resultant weighted key weights the vowels in the correct
>order.
>
>Tailoring of the collation table could modify any of this, but
>the above example is what you get just using the default table.
>
>But it is important that people implementing searching and sorting
>for Hebrew understand why and how the CGJ is "ignored" in this
>context, in order to get correct results. For example, if you
>strip the CGJ and *then* hand the string to the collation weighting
>algorithm, normalization will again rearrange the points into
>the wrong order for weighting.
>
>--Ken
>
>
>
>
>
>
Thank you, Ken. In this particular case we might want to tailor the
collation table so that this CGJ is effectively ignored. But I don't
understand this aspect of Unicode well enough to know exactly what can
be done.
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Tue Jul 29 2003 - 09:46:20 EDT