Randy Hughes asked about a Unicode searching capability.
My first suggestion would be to get familiar with the
Unicode Collation Algorithm:
http://www.unicode.org/unicode/reports/tr10/
The language-specific and cultural convention issues you
run into in searching are very closely related to the
problems for comparison of strings for sorting. Typically
the implementation for one can be used for the other.
The issue of what matches what (and at what level of
fuzziness) for Unicode is rather complex -- and it is
unlikely that people are going to want to have to reinvent
this wheel too many times, except for rather special-purpose
applications.
Also take a look at the technical report on Unicode Normalization
Forms:
http://www.unicode.org/unicode/reports/tr15/
Normalization addresses the issue of collapsing down different
canonical (or compatibility) equivalents into standard forms
that can be compared for equality reliably with a binary
comparison. This is also relevant to design of Unicode searching.
--Ken Whistler
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT