Public Review Issues

Accumulated Feedback on PRI #332

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Tue Nov 1 04:03:10 CDT 2016
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: Incorrectness in the text of UTS #10, Appendix B

In UTS #10, under the heading of “Appendix B: Synchronization with ISO/IEC 
14651”, the text reads as follows:

.  .  .
For each version of the UCA, the Default Unicode Collation Element Table
(DUCET) [Allkeys] is constructed based on the repertoire of the corresponding
version of the Unicode Standard. The synchronized version of ISO/IEC 14651 has
a Common Tailorable Template (CTT) table built for the same repertoire and
ordering. The two tables are constructed with a common tool, to guarantee
identical default (or tailorable) weight assignments. The CTT table for
ISO/IEC 14651 is constructed using only symbols, rather than explicit integral
weights, and with the Shift-Trimmed option for variable weighting.

Specifically,
    . . .  has a Common Tailorable Template (CTT) table build for . . .

… should instead read:
    . . .  has a Common Template Table (CTT) built for . . .

And,
    The CTT table for . . .

… should read:
    The CTT for . . .

Date/Time: Tue Nov 1 04:17:20 CDT 2016
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject:

UTS #10 sets out explicitly that if a fourth weight is used in a code point
that maps to a pair of collation elements, then these weights “are set to a
non-zero value in the first collation element and zero in the second”. We
would have, for instance:

[.FB40.0020.0004][.CE00.0000.0000]           # without fourth level
[.FB40.0020.0004.FFFF][.CE00.0000.0000.0000] # if a fourth level is used

The strange thing is that this UCA requirement is not fulfilled in the files
providing conformance tests for the Unicode Collation Algorithm. Here are two
examples:

In CollationTest_SHIFTED.txt, line 166009 and line 208019:

2F00 0062;   # . . . [FB40 CE00 1C60 | 0020 0020 | 0004 0002 | FFFF FFFF FFFF |]
10FFFE 0021; # . . . [FBE1 FFFE | 0020 | 0002 | FFFF FFFF 0260 |]

Rather than:

2F00 0062;   # . . . [FB40 CE00 1C60 | 0020 0020 | 0004 0002 | FFFF FFFF |]
10FFFE 0021; # . . . [FBE1 FFFE | 0020 | 0002 | FFFF 0260 |]

This makes it clear that both weights are set to a non-zero value.
Fortunately, this does not alter the ordering as it has been found to apply to
each concerned sequence of Unicode code points.

Feedback above this line was reviewed in UTC #149.

Date/Time: Wed Mar 15 14:35:53 CDT 2017
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Misuse of 'Logical Order'

UTS#10 Unicode Collation Algorithm Version Revision 34 Section 3.5 contains an
untrue (as well as offensive) statement about logical order.  Logical order is
define by TUS 9.00 Section 2.2 "Unicode Design Principles" defines logical
order by the statement, "The  order  in  which  Unicode  text  is  stored  in
the  memory  representation  is  called logical order."  It is therefore
simply untrue for UTS#10 to claim that, "Certain characters, such as the Thai
vowels เ through ไ (and related vowels in the Lao and Tai Viet scripts of
Southeast Asia), are not represented in strings in logical order." It then
goes on to say, "For collation, they are rearranged by swapping them with the
following character before further processing, because logically they belong
afterward."

In the currently proposed update of UTS#10, this text is moved to Section
6.1.1, with slight, irrelevant improvements.

The comments are now also applicable to the New Tai Lue script.

I recommend that the relevant clauses be corrected to "are not represented in
phonetic order" and "because for collation they belong afterward".  (This
skates round the fact that in some systems all the vowels should follow the
final consonants for collation.)

Date/Time: Tue Mar 28 11:17:58 CDT 2017
Name: Michael Bobeck
Report Type: Error Report
Opt Subject: TURNED GREEK SMALL LETTER IOTA collation in allkeys.txt

I noticed that TURNED GREEK SMALL LETTER IOTA from Letterlike Symbols in DUCET 
collates much earlier than all forms of greek alpha, while it should collate 
directly after all forms of greek iota and directly before all forms of greek 
yot. It should be corrected, since analogous KELVIN sign from Letterlike 
Symbols in DUCET collates after LATIN CAPITAL LETTER K and before FULLWIDTH 
LATIN CAPITAL LETTER K. Please correct in unicode.org/Public/UCA/latest/allkeys.txt 
this greek-related miscollation as soon as possible.