This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Tue Nov 1 04:03:10 CDT 2016
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: Incorrectness in the text of UTS #10, Appendix B
In UTS #10, under the heading of “Appendix B: Synchronization with ISO/IEC 14651”, the text reads as follows: . . . For each version of the UCA, the Default Unicode Collation Element Table (DUCET) [Allkeys] is constructed based on the repertoire of the corresponding version of the Unicode Standard. The synchronized version of ISO/IEC 14651 has a Common Tailorable Template (CTT) table built for the same repertoire and ordering. The two tables are constructed with a common tool, to guarantee identical default (or tailorable) weight assignments. The CTT table for ISO/IEC 14651 is constructed using only symbols, rather than explicit integral weights, and with the Shift-Trimmed option for variable weighting. Specifically, . . . has a Common Tailorable Template (CTT) table build for . . . … should instead read: . . . has a Common Template Table (CTT) built for . . . And, The CTT table for . . . … should read: The CTT for . . .
Date/Time: Tue Nov 1 04:17:20 CDT 2016
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject:
UTS #10 sets out explicitly that if a fourth weight is used in a code point that maps to a pair of collation elements, then these weights “are set to a non-zero value in the first collation element and zero in the second”. We would have, for instance: [.FB40.0020.0004][.CE00.0000.0000] # without fourth level [.FB40.0020.0004.FFFF][.CE00.0000.0000.0000] # if a fourth level is used The strange thing is that this UCA requirement is not fulfilled in the files providing conformance tests for the Unicode Collation Algorithm. Here are two examples: In CollationTest_SHIFTED.txt, line 166009 and line 208019: 2F00 0062; # . . . [FB40 CE00 1C60 | 0020 0020 | 0004 0002 | FFFF FFFF FFFF |] 10FFFE 0021; # . . . [FBE1 FFFE | 0020 | 0002 | FFFF FFFF 0260 |] Rather than: 2F00 0062; # . . . [FB40 CE00 1C60 | 0020 0020 | 0004 0002 | FFFF FFFF |] 10FFFE 0021; # . . . [FBE1 FFFE | 0020 | 0002 | FFFF 0260 |] This makes it clear that both weights are set to a non-zero value. Fortunately, this does not alter the ordering as it has been found to apply to each concerned sequence of Unicode code points.
Feedback above this line was reviewed in UTC #149.
Date/Time: Wed Mar 15 14:35:53 CDT 2017
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Misuse of 'Logical Order'
UTS#10 Unicode Collation Algorithm Version Revision 34 Section 3.5 contains an untrue (as well as offensive) statement about logical order. Logical order is define by TUS 9.00 Section 2.2 "Unicode Design Principles" defines logical order by the statement, "The order in which Unicode text is stored in the memory representation is called logical order." It is therefore simply untrue for UTS#10 to claim that, "Certain characters, such as the Thai vowels เ through ไ (and related vowels in the Lao and Tai Viet scripts of Southeast Asia), are not represented in strings in logical order." It then goes on to say, "For collation, they are rearranged by swapping them with the following character before further processing, because logically they belong afterward." In the currently proposed update of UTS#10, this text is moved to Section 6.1.1, with slight, irrelevant improvements. The comments are now also applicable to the New Tai Lue script. I recommend that the relevant clauses be corrected to "are not represented in phonetic order" and "because for collation they belong afterward". (This skates round the fact that in some systems all the vowels should follow the final consonants for collation.)
Date/Time: Tue Mar 28 11:17:58 CDT 2017
Name: Michael Bobeck
Report Type: Error Report
Opt Subject: TURNED GREEK SMALL LETTER IOTA collation in allkeys.txt
I noticed that TURNED GREEK SMALL LETTER IOTA from Letterlike Symbols in DUCET collates much earlier than all forms of greek alpha, while it should collate directly after all forms of greek iota and directly before all forms of greek yot. It should be corrected, since analogous KELVIN sign from Letterlike Symbols in DUCET collates after LATIN CAPITAL LETTER K and before FULLWIDTH LATIN CAPITAL LETTER K. Please correct in unicode.org/Public/UCA/latest/allkeys.txt this greek-related miscollation as soon as possible.