From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Jul 17 2004 - 18:46:00 CDT
Thank you for reviewing this.
DiacriticFolding (unlike AccentFolding) is selective about which combining
marks it removes for which base character. I wonder whether that's truly
intended, or whether it could be replaced by a combination of
AccentFolding
OtherDiacriticFolding
where AccentFolding removes *all* nonspacing marks following Latin, Greek
or Cyrillic letters and we would remove from DiacriticFolding all cases
that are already handled by accent folding.
That still doesn't take care of Hebrew, so we would need to decide how to
handle that. Perhaps you would like to put forth a proposal as to what
accents or diacritics should be folded for Hebrew, and in what context. Is
it just Dagesh?
The other alternative would be to limit the nonspacing marks to those that
actually occur with Latin / Greek / Cyrillic letters as ordinary diacritics
(i.e. all the diacritics that show up in DiacriticFolding.txt), but then
remove them if they follow *any* base character from that set, not just in
certain fixed combinations.
Rather than list the mappings in a file, we would simply list the
conditions, similar to AccendFolding (see
http://www.unicode.org/reports/tr30/Foldings.txt) and reduce the data file
to those cases where there are no mappings (o with stroke -> o, combining
stroke overlay, etc.).
John, you proposed the initial set. Do you have any suggestion here?
A./
This archive was generated by hypermail 2.1.5 : Sat Jul 17 2004 - 18:48:33 CDT