Date: Fri, 13 May 94 08:08:02 EDT From: Edwin Hart Subject: Letter to the editor To: Andy Feibus Dear Andy, Thank you for your most gratious response. Anytime someone starts talking about Unicode or 10646 it quickly gets my attention. I don't mind making it a letter to the editor, but I should add some information before doing so (e.g., a phone number for Unicode Inc. and ANSI, the fact that Unicode and the ISO working group are continuing to cooperate). I am not sure that I emphasized the "merger" idea enough. What you said was correct but I want to emphasize that there was some give and take on both sides to produce the merger. With the merger, 10646 looks more like Unicode 1.0 than the ISO/IEC draft international standard version 1 (DIS-1). I hear conversations that some of the participants outside the US are unhappy about this. Politically, I need to ensure that readers recognize that DIS-1 features that ISO found important were also incorporated into 10646 (and Unicode V1.1). Otherwise it looks to the world outside the US that the big, bad US once again had its way at the expense of everyone else. The E-mail takes on a new purpose with a wider audience so I want to take some more time to be sure that it is "right". Do you have any particular suggestions that would help clarify what I intended to say, or is something missing, etc.? How would you like to see the letter, in ASCII, in WordPerfect, in Word? I can use Microsoft Mail to send you the document in WP or Word as an attachment. As another issue, I have a colleague in Que/bec City, Canada who has done some significant work in culturally-correct sorting. His name is Alain La Bonte/ ("e/" means "e" with an acute accent.) He edited the Canadian standard that describes an algorithm to correctly sort English and French words together. (The algorithm will work for other languages if the weights assigned to each letter are changed.) Right now, Alain is trying to define a default sorting order for the characters in 10646. This is a formidable task. Alain may be reached via e-mail at ALB@SHE.ORG.UK. If you look at most (almost all?) of the sorting, it is done on the basis of the code-positions (binary value associated with a character). I remember that's what I did in college. Thus, with ASCII-7 encoding, you sort the digits before the capital letters, before the lower-case letters. In US EBCDIC, you sort the lower-case letters before the capital letters, before the digits. Now, what were you and I taught in the first grade? We were taught to sort AaBb . . .Zz and using the code position in ASCII-7 and US EBCDIC does not match what we were taught. In summary, we should be sorting using something other than the code positions to sort correctly in English. (By the way, I said "ASCII-7" because the US has adopted ISO 8859-1:1987 as the US standard for 8-bit ASCII.) Thus far, we have only discussed English sorting with the 26 letters of the English alphabet. When we extend the character set from 7-bit ASCII (95 characters) to the 191 characters of Latin-1 character set of 8-bit ASCII (ANSI/ISO 8859-1), using the code positions of the right half of the code table for sorting results in incorrect sorting in all of the languages supported by the Latin-1 character set. Culturally, many languages that use the Latin-1 character set, have some wrinkles in the sorting that surprize English-speaking people. For example, Danish has three additional letters in its alphabet: the letter "AE", the A with a circle (Angstrom symbol), and O with a slash "/". If I were sorting these letters, I'd place the A with a circle and AE letters after the "A" and the O with a slash after the "O". However in Danish, all of these letters sort after the "Z" in Danish. Best regards, Ed Ú Edwin Hart Andy 05/13/94 Letter to the editor