On Mon, 21 Oct 2013 00:33:58 +0530
Pravin Jain <pravin_at_zensoftech.co.in> wrote:
I've taken the liberty of replying to the list.
> One observation for Indic scripts.
> +U0933 normally comes after +U0939, in dictionary, except for this all
> other code points are properly ordered.
> similarly in the Gujarati block
> +U0AB3 comes after +U0AB9.
This is different to any issue of 'logical order'; this point relates to
the code values used, rather than the order in which the codepoints
of a string are stored.
Sorting for human consumption normally uses look up tables for the
comparison of characters, and these should handle this issue. However,
the order is as for the codepoints in the range U+0933 to U+0939 in the
Default Unicode Collation Element Table (DUCET), which is controlled by
the Unicode Technical Committee and in the CLDR default and Hindi
collation tables, which are controlled by the CLDR technical committee.
I am surprised that this has not been corrected - the corresponding
codepoint, when it exists, comes in the alphabetical order you describe
in the Buddhist Indic scripts. Assuming the current collations are
wrong, please raise a ticket at http://unicode.org/cldr/trac/newticket
and point to some evidence, e.g. an image of entries in a printed
dictionary. It may be worth reporting the issue against DUCET at
http://www.unicode.org/reporting.html ; however, it may be argued that
this is not a sufficiently egregious error for it to be corrected. If
you do report it for DUCET, please reference the CLDR ticket number.
Richard.
Received on Sun Oct 20 2013 - 18:49:17 CDT
This archive was generated by hypermail 2.2.0 : Sun Oct 20 2013 - 18:49:17 CDT