From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri May 14 2004 - 12:10:38 CDT
From: "Michael Everson" <everson@evertype.com>
> At 06:35 -0700 2004-05-14, Peter Kirk wrote:
>
> >But there is an exceptional issue within the family of north-west
> >Semitic scripts, which may apply also to others e.g. Greek, Coptic
> >and archaic Greek - possibly also the Indic scripts.
>
> I don't think so.
>
> >Within these sets of scripts there is NO ambiguity about which
> >characters correspond to which, as they have identical repertoires,
> >with possibly additional letters in some of the scripts for which no
> >equivalent can be defined in the other scripts.
>
> That doesn't mean that an ordered list with them interfiled is in any
> way legible.
I do agree. UCA is first built to produce legible and consistent ordering for
various kinds of readers, both experts or simple users that can only read one
language or one script. We can interleave some variants that have an obvious
relation with other wellknown characters (accented letters are good examples,
even if some may wonder why there are thorn lettern between T and U; these
letters being more rare even in the languages that use them, this inreleaving of
variants does not make the ordering completely unreadable).
For search purposes, what some want is not really a collation order but
equivalence relations. This belongs to the same need as case folding, or case
insensitive searches.
I see no opposition in adding new types of string folding, for those that would
like to "fold" (in fact transliterate) Phenician to Hebrew (the reverse being
hard to implement consistently due to the various sets of Hebrew diacritics), or
to Greek. There can even exist some standard guideline to implement such folding
or transliteration (for the same reason that there does exist standard folding
rules for case in Latin/Greek/Cyrillic or for Katagana-to-Hiragana in Japanese).
Such folding belongs to the same area, with the same caveats (in terms of text
interpretation), as custom normalizations or compatibility normalizations
performed on unknown input text: a linguistic semantic is lost.
This archive was generated by hypermail 2.1.5 : Fri May 14 2004 - 12:11:13 CDT