Re: Comparison algorithms in UNICODE

From: David Goldsmith (david_goldsmith@taligent.com)
Date: Sun Aug 13 1995 - 00:07:53 EDT


At 12:35 PM 8/12/95, unicode@Unicode.ORG wrote:
>Within the development of the distributed directory service
>software DIGGER which uses Whois++ technology we will now
>start the development of public domain software libraries
>written in C which takes care of fundamental string functions
>such as:
>
>- Optimization of strings
>
> Some characters in the UNICODE table can be written by
> using a different base character which is then followed by
> one or more composition characters. An example is the
> character 00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
> which can be written as 0041+030A. The idea behind this
> function is to minimize the number of bytes in the
> UNICODE string by converting all occurances of 0041+030A
> into 00C5.

I haven't heard of any public libraries to do this.
>
>- Uppercase/Lowercase conversions
>

No software, but should be easy from the character properties table,
ftp://unicode.org/pub/MappingTables/UnicodeDataCurrent.txt.Z

or

ftp://unicode.org/pub/MappingTables/UnicodeDataCurrent.txt

>- Comparison routines
>

Language-sensitive or not?

>- Conversion to/from FSS-UTF and UNICODE
>

Already available in source form:
ftp://unicode.org/pub/Programs/ConvertUTF/

Hope this helps.

David Goldsmith
Senior Scientist
Taligent, Inc.
10201 N. DeAnza Blvd.
Cupertino, CA 95014
david_goldsmith@taligent.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT