Re: Character folding in text editors from Asmus Freytag (t) on 2016-02-20 (Unicode Mail List Archive)

From: Asmus Freytag (t) <asmus-inc_at_ix.netcom.com>
Date: Sat, 20 Feb 2016 14:10:04 -0800

On 2/20/2016 9:56 AM, Eli Zaretskii wrote:

From: Philippe Verdy <verdy_p@wanadoo.fr>
Date: Sat, 20 Feb 2016 18:27:41 +0100
Cc: unicode Unicode Discussion <unicode@unicode.org>

Unless we have case folding tailored by language, you cannot do that based on the Unicode database alone.

What about language-independent character-folding: where in the
Unicode database is the data for that?

Unicode, even CLDR, doesn't nearly have enough data for the purpose.
(and as a corollary of what Elias points out, it's likely to annoy users of every language, in that it would fold essential and non-essential distinctions indiscriminately).

I've been working on this problem in the context of international top-level domain names, where the aim of the project is to identify labels that are seen as "the same" by users of a given script (but, in cases of identical appearance, we also include those seen as identical by users across scripts).

None of the working groups in this project has felt like turning to CLDR for this purpose, and so far, each has approached the issue in a way that is not linked to sorting.

Finally, none has seen folding of diacritics as useful; however, in the case of Arabic, where optional combining marks simply are not supported (so as to avoid having to define a folding).

(see https://www.icann.org/sites/default/files/lgr/lgr-1-arabic-script-01dec15-en.html)
A./
Received on Sat Feb 20 2016 - 16:11:15 CST

This archive was generated by hypermail 2.2.0 : Sat Feb 20 2016 - 16:11:15 CST