RE: Linguistic precedence [was: (TC304.2313) AND/OR: antediluvian

From: Michael Kaplan (Trigeminal Inc.) (v-michka@microsoft.com)
Date: Thu Jun 15 2000 - 12:48:21 EDT


> > >(Has somebody written a comprehensive collection of all these collation
> > >problems?)
>
Ok, here is the full list of ones I know about, and the VB code that would
demonstrate them, as needed:

(Note: All of this is coming from the book I am working on that discussed
i18N for Visual Basic, hopefully other platform folks will not ignore me on
that basis! StrComp is a VB intrinsic function with a little known feature
of a Compare argument that will accept an LCID that specifies the locale to
use from the Windows NLS database.)

Czech: "ch" is considered a single character for sorting purposes, which
sorts after "h".
(example --- StrComp("ch", "h", 1029) will return 1 instead of -1 as it will
on most other locales)
Other Czech issues can mostly be ignored here, since U+010D and U+0161 would
approproately sort after "c" and "s" and few people would argue with it.

Danish/Norwegian/Finnish/Swedish: U+00E4 and U+00F6 sort after "z"
(example -- StrComp("z", ChrW(228), 1044) will return -1 instead of 1 as it
will on most locales)

Danish/Norwegian/Finnish/Swedish: U+00FC sorts after "y"
(example -- StrComp("y", ChrW(252), 1030) will return -1 instead of 1 as it
will on most locales)

Finnish/Swedish: "w" and "v" sort the same
(example -- StrComp("wa", "vo", 1053) will return -1 instead of 1 as it will
on most locales

Lithuanian: "y" is equivalent to "i"
(example -- StrComp("j", "y", 1063) will return 1 instead of -1 as it will
on most locales)
Other Lithuanian issues can most be ignored here since U+010D, U+0161, and
U+017E would appropriately sort after "c", "s", and "z" and few people would
argue about it.

Polish: U+015B sorts between "s" and "t", which can be ignored since most
locales would want to do that anyway.

Spanish Traditional: "ch" is a unique char for sorting purposes between "c"
and "d"
(example -- StrComp("cz", "ch", 1034) will return -1 instead of 1 as it will
on most locales, including Spanish Modern (3082))

Spanish Traditional "ll" is a unique char for sorting purposes between "l"
and "m"
(example -- StrComp("lz", "ll", 1034) will return -1 instead of 1 as it will
on most locales, including Spanish Modern (3082))

Michael



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT