Re: Case insensitive comparisions

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Mar 28 2000 - 15:29:42 EST

Next message: Jungshik Shin: "RE: DEC multilingual code page, ISO 8859-1, etc."
Previous message: Shigeki Moro: "Re: IME of Devanagari"
Maybe in reply to: david@oz.com: "Case insensitive comparisions"
Next in thread: Christopher John Fynn: "Re: Case insensitive comparisions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Keld responded to David,

>
> On Tue, Mar 28, 2000 at 08:24:57AM -0800, david@oz.com wrote:
> > I was stepping through some code that did case insensitive comparison for
> > unicode, and noticed that the heart of the function converted each
> > character to lowercase before comparing them. If they matched, they were
> > considered equal, otherwise not.
>
> If you do case insensitive comparison, the right thing is to
> do it at the case insensitive level of the ISO/IEC ordering
> standard 14651. Don't do uppercase to lowercase mapping first,
> just compare directlye, case insensitive.
>
> Keld
>

This is, indeed, one way to do case insensitive comparison. If you
have access to an implementatation of string ordering according to
the forthcoming standard ISO/IEC 14651, or according to the corresponding
Unicode Standard: UTR #10 Unicode Collation Algorithm, and if that
implementation provides a good API that allows efficient, case insensitive
comparison of two strings according to a particular collation definition,
then this can be a good choice.

However, case folding does not necessarily depend on a collation
algorithm. See also the Unicode Technical Report #21, Case Mappings, for
discussion of case folding and a suggested data file for doing locale-independent
case folding.

There are circumstances under which one definitely does *not* want to
have case folding depend on particular collation tables or on locale
differences in comparison. The explanation and examples are provided
in the Case Mappings technical report.

--Ken

Next message: Jungshik Shin: "RE: DEC multilingual code page, ISO 8859-1, etc."
Previous message: Shigeki Moro: "Re: IME of Devanagari"
Maybe in reply to: david@oz.com: "Case insensitive comparisions"
Next in thread: Christopher John Fynn: "Re: Case insensitive comparisions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT