Re: Folding algorithm and canonical equivalence

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Jul 17 2004 - 18:46:00 CDT

  • Next message: Peter Kirk: "Re: Folding algorithm and canonical equivalence"

    Thank you for reviewing this.

    DiacriticFolding (unlike AccentFolding) is selective about which combining
    marks it removes for which base character. I wonder whether that's truly
    intended, or whether it could be replaced by a combination of

    AccentFolding
    OtherDiacriticFolding

    where AccentFolding removes *all* nonspacing marks following Latin, Greek
    or Cyrillic letters and we would remove from DiacriticFolding all cases
    that are already handled by accent folding.

    That still doesn't take care of Hebrew, so we would need to decide how to
    handle that. Perhaps you would like to put forth a proposal as to what
    accents or diacritics should be folded for Hebrew, and in what context. Is
    it just Dagesh?

    The other alternative would be to limit the nonspacing marks to those that
    actually occur with Latin / Greek / Cyrillic letters as ordinary diacritics
    (i.e. all the diacritics that show up in DiacriticFolding.txt), but then
    remove them if they follow *any* base character from that set, not just in
    certain fixed combinations.

    Rather than list the mappings in a file, we would simply list the
    conditions, similar to AccendFolding (see
    http://www.unicode.org/reports/tr30/Foldings.txt) and reduce the data file
    to those cases where there are no mappings (o with stroke -> o, combining
    stroke overlay, etc.).

    John, you proposed the initial set. Do you have any suggestion here?

    A./



    This archive was generated by hypermail 2.1.5 : Sat Jul 17 2004 - 18:48:33 CDT