Re: ISO/IEC 14651 (International character String Ordering) documentation

From: Alain LaBont\i SCT (alb@sct.gouv.qc.ca)
Date: Mon Aug 25 1997 - 15:08:19 EDT


A 11:29 97-08-25 -0800, Ken Krugler a écrit :
>Alain,
>
>In a recent email to the Unicode mailing list, you said:

[Alain] :
>>In this vein, ISO/IEC 14651 defines a notion of equivalence that is
>>systematic, data being structured a bit like floating point is structured
>>in machines for computing, from the most significant elements to the least
>>(if you truncate a floating point number from the right and back, at the
>>limit leaving only one bit for the sign, you still can do comparisons : the
>>bit sign tells you if a number is negative or positive, the exponent gives
>>you its order of magnitude, the mantissa gives more precision, bit after
bit):
>>
>>level 1: base letter, whatever it is for a given language
>>level 2: diacritical marks applying to level 1
>>level 3: case or shape variant applying to level 1
>>level 4: special characters
>>
>>And we define an API that allows you to ask for a comparison result whose
>>precision will be based on these levels... For details see the latest draft
>>or drafts to come.

[Ken] :
>What's the URL for the latest draft(s)? I scanned through older Unicode
>emails, and the only reference took me to Keld's site, which didn't seem to
>have the actual text/tables, just notes about the project. A quick scan of
>the Internet with Infoseek also gave me nothing useful, so I thought I'd go
>to the source. Sorry about the hassle.
>
>By the way, I recently was looking through my old documentation on sorting,
>and came across a paper written in 1991 by René Haentjens (sp?), which
>referenced your work on sorting algorithms...which had served as the basis
>for my own implementation that I needed for an on-line dictionary product
>(Shasta for Mac/Windows). Though I was mostly dealing with Far East
>languages such as Japanese and Chinese. Anyway, the finite state
>machine/table mechanism I used for a one-pass conversion worked OK, except
>in the case of handling multiple space/hyphens correctly. I basically
>converted the text into sequences of 16-bit major & 8-bit minor sorting
>values, one 'sorting value' at a time, and compared until an appropriate
>level of precision didn't match. I also used these sorting values as my
>'keys' for blind-case comparison, using just the 16-bit values. Handy for
>creating keys to blocks of dictionary data.
>
>-- Ken
>
>Ken Krugler
>TransPac Software, Inc.
>+1 408-261-7550

[Alain] :
Current draft is at http://www.dkuug.dk/JTC1/SC22/WG20/prot/
but it is outdated and I just mention it fyi (technically it is still valid
but tables will have to be corrected and text will have to be revisited
editorially). I have to produce a considerably revised version (this was
voted upon in ISO and we got thirty-something pages of comments, for which
the official disposition of comments makes 35 pages).

The new version should come out for FCD ballot at the end of October
(current plans) to synchronize it with ISO/IEC 14652 (which describes the
syntax used for making tables up). I will produce it internally in
September for ISO/IEC JTC1/SC22/WG20 members' comments before sending it out.

Btw I don't know what you do with spaces and hyphens but in my original
works, these are ignored at level 1, just to avoid problems (in agreement
with CEN/TC304 we have an option for processing spaces or not at level 1,
it seems necessary)... I just suspect that you didn't do that, although
this might be presumptuous on my side.

One part of my original works [it began in 1985-1986 but was fully mature
in 1988] on this in French is on several sites in North America and in
Europe. Let me cite one series of URL showing the main stuff to which
succeeded Canadian standard CAN/CSA Z243.4.1 and later on the complementary
standard CAN/CSA Z243.230 (this was published on paper as 3 articles):
http://www.crim.ca/APIIQ/interfac/face9603.html
http://www.crim.ca/APIIQ/interfac/face0596.html#regles
http://www.crim.ca/APIIQ/interfac/face0796.html

Alain LaBonté
Québec

bcc UNICODE, TC304, SC22WG20



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT