RE: collating sequence

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Thu Jun 28 2001 - 16:23:21 EDT


Vibha,

If you are looking for a cross-platform tool for C/C++ then the best is ICU.
With the latest release (ICU 1.8) they have totally redone the logic for
strict Unicode conformance. Currently I am developing an easy to implement
interface (xIUA) that is also free Open Source code.

The collations API come in three flavors:

U_CAPIX int32_t U_EXPORTX /* 1 = TRUE, 0 = FALSE, -1 = LOGIC ERROR */
 xiua_Collate(char *str1, /* string 1 */
           char * option, /* option string contains both comparison test
*/
                                /* and optional collation strength parameters */
                                        /* "==" "<=" ">=" "!=" "<" ">" are the */
                                        /* comparison test values and "?" ":" "#" are */
                                        /* the valid strength codes. "==?" is a test */
                                        /* for equal primary strength. */
                                        /* ? = Primary letters match no case or case */
                                        /* e.g "Black-bird" ==? "blackbird" */
                                        /* but what consitiute separate letters may differ */
                                        /* by locale e.g. Spanish ch ll */
                                        /* Secondary case insensitive normalized with accents */
                                        /* : = Tertiary above plus case sensitive */
                                        /* # = Strict match */
                                        /* spaces are ignored, non-standard conditions are */
                                        /* supported "!<>" or "=" are the same as "==" */
                                        /* "" or "!" however are illogical and are errors */
             char * str2); /* string 2 */

U_CAPIX int32_t U_EXPORTX /* UTF-16 version */
 xiu2_Collate(UChar *str1, char * option, UChar * str2);

U_CAPIX int32_t U_EXPORTX /* UTF-32 version */
 xiu4_Collate(UChar32 *str1, char * option, UChar32 * str2);

U_CAPIX int32_t U_EXPORTX /* UTF-8 version *.
 xiu8_Collate(char *str1, char * option, char * str2);

U_CAPIX int32_t U_EXPORTX /* code page version */
 xicp_Collate(char *str1, char * option, char * str2);
____________________________________________________________________________
______
U_CAPIX int32_t U_EXPORTX /* 1 = s1>s2, 0 = s1=s1, -1 = s1<s2 */
 xiua_strcoll(char *str1, /* string 1 */
                                                                        char * str2); /* string 2 */

U_CAPIX int32_t U_EXPORTX /* 1 = s1>s2, 0 = s1=s1, -1 = s1<s2 */
 xiua_stricoll(char *str1, /* string 1 */
                                                                        char * str2); /* string 2 */

#define xiua_strcasecoll(a,b) xiua_stricoll(a,b)
____________________________________________________________________________
______
U_CAPIX int32_t U_EXPORTX /* 1 = s1>s2, 0 = s1=s1, -1 = s1<s2 */
 xiua_strcollEx(char *str1, /* string 1 */
                char * str2, /* string 2 */
                XIUA_CollStrength strength); /* collate strength */

This last version also comes in UTF-32. UTF-16, UTF-8 and code page
versions. It gives you some control over strength and normalization.

Because it is source you can tailor it to call ICU with whatever parameters
are appropriate.

You can either use xIUA or extract the code and use it in your own code.

Carl

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Magda Danish (Unicode)
> Sent: Thursday, June 28, 2001 9:54 AM
> To: unicode@unicode.org
> Subject: FW: collating sequence
>
>
>
>
> -----Original Message-----
> From: Vibha R [mailto:vibhar@india.hp.com]
> Sent: Wednesday, June 27, 2001 11:27 PM
> To: info@unicode.org
> Subject: collating sequence
>
>
> hello
> is there any tool which will generate the collating sequence for a
> particular language
>
>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:19 EDT