Re: Unicode Searching

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Apr 28 1999 - 14:53:14 EDT

Next message: Addison Phillips: "RE: Unicode Searching"
Previous message: Randy Hughes: "Unicode Searching"
Maybe in reply to: Randy Hughes: "Unicode Searching"
Next in thread: Addison Phillips: "RE: Unicode Searching"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Randy Hughes asked about a Unicode searching capability.

My first suggestion would be to get familiar with the
Unicode Collation Algorithm:

http://www.unicode.org/unicode/reports/tr10/

The language-specific and cultural convention issues you
run into in searching are very closely related to the
problems for comparison of strings for sorting. Typically
the implementation for one can be used for the other.

The issue of what matches what (and at what level of
fuzziness) for Unicode is rather complex -- and it is
unlikely that people are going to want to have to reinvent
this wheel too many times, except for rather special-purpose
applications.

Also take a look at the technical report on Unicode Normalization
Forms:

http://www.unicode.org/unicode/reports/tr15/

Normalization addresses the issue of collapsing down different
canonical (or compatibility) equivalents into standard forms
that can be compared for equality reliably with a binary
comparison. This is also relevant to design of Unicode searching.

--Ken Whistler

Next message: Addison Phillips: "RE: Unicode Searching"
Previous message: Randy Hughes: "Unicode Searching"
Maybe in reply to: Randy Hughes: "Unicode Searching"
Next in thread: Addison Phillips: "RE: Unicode Searching"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT