Character folding in text editors
eliz at gnu.org
Sun Feb 21 10:21:24 CST 2016
> From: "Doug Ewell" <doug at ewellic.org>
> Date: Sat, 20 Feb 2016 14:43:15 -0700
> > What about language-independent character-folding: where in the
> > Unicode database is the data for that?
> The OP kind of alluded to that: there is no such thing really as
> language-independent character folding.
Emacs is currently looking for a useful approximation, given that the
language of the text is in general unknown. The folding can be
toggled off (either as a global default, or for the current search),
for those use cases where it is undesirable or gets in the way.
> About the closest approximation you can get using Unicode data alone
> (not CLDR) is to normalize to NFD, then ignore the combining diacritics.
This is what Emacs currently does, IIUC what you say. The NFD
normalization uses the decomposition data included with
UnicodeData.txt. Is this what you mean?
> But that still doesn't work for a character like ø, which doesn't
> decompose to o + anything
Why doesn't it, btw? Same question about ł.
I've heard an opinion that UnicodeData.txt only included
decompositions when the combining mark's glyphs don't overlap those of
the basic character. Is that correct?
> and more importantly, it still won't meet expectations because of
> the n/ñ and o/ö/ø language-dependency problems.
Given that the feature can be turned off easily, do you think that it
will nonetheless be useful, even though language-dependent parts are
More information about the Unicode