From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Dec 04 2003 - 22:04:00 EST
mjabbar@bangla.net writes:
> Please also inform me about what will be the sorting for Bangla.
> Thanks and regards
> Mustafa Jabbar
Same response: you don't sort on codepoints but using UCA and the default
Unicode collation elements table (DUCET) published in Unicode charts, but
compiled for example as a text file containing collation rules (see
UCARules.txt in ICU) or as a complete conversion table from codepoints to
collation weights.
For Bengla, the DUCET will certainly not be enough to match all your needs,
and you'll probably need to tailor the collation order using expansion rules
and swaps with more collation levels than what is shown in DUCET (just just
documents 3 levels before the codepoint order: primary, secondary, ternary).
It will be however simpler than sorting Thai with the logical (phonetic)
order, which requires a preprocessing to find grapheme clusters and
syllables with a dictionnary, unless you prefer to sort simply on the visual
order I confess that I have not attempted to do any sorting of Thai data. If
I had to do that I would need to use a complete implementation found in ICU
(but ICU is quite large for some projects).
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 22:57:21 EST