On 2/20/2016 9:56 AM, Eli Zaretskii
wrote:
From: Philippe Verdy <verdy_p@wanadoo.fr>
Date: Sat, 20 Feb 2016 18:27:41 +0100
Cc: unicode Unicode Discussion <unicode@unicode.org>
Unless we have case folding tailored by language, you cannot do that based on the Unicode database alone.
What about language-independent character-folding: where in the
Unicode database is the data for that?
Unicode, even CLDR, doesn't nearly have enough
data for the purpose.
(and as a corollary of what Elias points out, it's likely to annoy
users of every language, in that it would fold essential and
non-essential distinctions indiscriminately).
I've been working on this problem in the context of international
top-level domain names, where the aim of the project is to
identify labels that are seen as "the same" by users of a given
script (but, in cases of identical appearance, we also include
those seen as identical by users across scripts).
None of the working groups in this project has felt like turning
to CLDR for this purpose, and so far, each has approached the
issue in a way that is not linked to sorting.
Finally, none has seen folding of diacritics as useful; however,
in the case of Arabic, where optional combining marks simply are
not supported (so as to avoid having to define a folding).
(see
https://www.icann.org/sites/default/files/lgr/lgr-1-arabic-script-01dec15-en.html)
A./
Received on Sat Feb 20 2016 - 16:11:15 CST