From: Jim Allan (jallan@smrtytrek.com)
Date: Sun Nov 10 2002 - 14:06:51 EST
Carl W. Brown posted:
> There already is a Unicode solution for the problem. Check UAX #21.
> If search engines use case insensitive compares then it should be no
> problem.
Yes, if only Google and other search engines would implement at least
the minumum recommended foldings in
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt. Currently Google
does not even equate /ß/ with /ss/.
> There are a lot of exceptions to the rule so that you need separate
> characters for the forms but you also need an algorithm that works
> reasonable well for most cases.
>
> "Character (final sigma) is preceded by a sequence consisting of a
> cased letter and a case-ignorable sequence, and character is not
> followed by a sequence consisting of an ignorable sequence and then a
> cased letter."
I totally agree with this.
My original post was directed against the argument that final sigma and
non-final sigma should have been merged as a single character to be
displayed as either non-final sigma, final sigma, or lunate sigma
according to a higher protocol (e.g. an intelligent font).
If this route had been taken, then one would still require some method
to indicate exceptions, either proprietary triggers in a font or other
higher software or an overriding variation selection character at the
plain text level. It is not clear that any greater ease would have been
gained by moving the necessary variation selection from one level of
representation to another.
Jim Allan
This archive was generated by hypermail 2.1.5 : Sun Nov 10 2002 - 15:21:47 EST