The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 24, 2012, since the previous cumulative document was issued prior to UTC #131 (May 2012). This document does not include feedback on moderated Public Review Issues from the forum that have been digested by the forum moderators; those are in separate documents for each of the PRIs. Gray items in the Table of Contents do not have feedback here.
The links below go to directly to open PRIs and to feedback documents for them, as of October 31, 2012.
The links below go to locations in this document for feedback.
Feedback on Encoding Proposals
Closed Public Review Issues
Error Reports
Other Reports
Date/Time: Mon Oct 29 17:20:24 CDT 2012
Contact: cowan@ccil.org
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/12-333 Request to UTC to Propose 226 Characters for Inclusion in CJK Extension F
Several of the characters listed in this proposal are annotated "Variant form of ..." These would seem to be candidates for encoding with a variation selector. Indeed, some justification should be provided for not using variation selectors in all these cases.
Date/Time: Mon Oct 29 17:25:51 CDT 2012
Contact: cowan@ccil.org
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/12-309 Revised Proposal to add the Ahom Script in the SMP of the UCS
AHOM DIGITs 1-9 should be spelled AHOM DIGITs ONE-NINE, as is conventional in Unicode (digits are used in Unicode names only to represent shapes). Furthermore, AHOM DIGIT 10 and AHOM DIGIT 20 should be AHOM NUMBER TEN and AHOM NUMBER TWENTY. It is not clear that Nd is appropriate for the digits of this system.
Date/Time: Mon Aug 13 22:24:39 CDT 2012
Contact: pedberg@apple.com
Name: Peter Edberg
Report Type: Error Report
Opt Subject: Incorrect kMandarin value for U+7565
Currently for 略 U+7565, Unihan has the following: kHanyuPinyin 42541.110:lüè kMandarin è The kMandarin value is incorrect, it should be lüè (lüe4), per Lee Collins and several others. NOTE: Comments below from Richard Cook: Peter's report reveals related bugs in kHanyuPinlu data, making four errata total: WRONG: 略 [U+7565] kHanyuPinlu ⇒ e4(445) kMandarin ⇒ è 掠 [U+63A0] kHanyuPinlu ⇒ e4(62) kMandarin ⇒ è RIGHT: 略 [U+7565] kHanyuPinlu ⇒ lüe4(445) kMandarin ⇒ lüè 掠 [U+63A0] kHanyuPinlu ⇒ lüe4(62) kMandarin ⇒ lüè Maybe add this to errata/feedback pile? The two kHanyuPinlu errors go back to 2003, fixes should be made all around when possible. -Richard
Date/Time: Tue Aug 21 14:15:53 CDT 2012
Contact: cdutro@twitter.com
Name: Cameron Dutro
Report Type: Other Question, Problem, or Feedback
Opt Subject: Clarifying French Backwards Accent Sorting in TR-10
Note: This comment was passed along to the editorial committee after close of beta.
The TR-10 document is written as though French backwards accent sorting applies to all French dialects, when in reality it only applies to Canadian French. Can the document be updated to mention this fact? Relevant tickets: http://unicode.org/cldr/trac/ticket/2905 and http://unicode.org/cldr/trac/ticket/2984. Thanks!
Date/Time: Wed Aug 22 18:51:58 CDT 2012
Contact: pedberg@apple.com
Name: Peter Edberg
Report Type: Error Report
Opt Subject: kMandarin error reports from CLDR
We have a couple of CLDR bug reports about pinyin errors for various characters that are actually the result of errors in the Unihan kMandarin field for these characters. Here are the CLDR tickets with further details: * http://unicode.org/cldr/trac/ticket/3866, Fix pinyin without tones. * http://unicode.org/cldr/trac/ticket/5205, Pinyin errors noted by Åke Persson.
Date/Time: Thu Aug 23 18:19:58 CDT 2012
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Error Report
Opt Subject: UTS #18 code for collation grapheme clusters vs. discontiguous contractions
In L2/12-250 "observation 94" Richard Wordingham points out that "The code in UTS#18 Annex B does not appear to be able to handle interleaving discontiguous grapheme clusters." In UCA 6.2 we are making a fix to the algorithm in UCA section 6.9. Either UTS #18 should be updated to match, or it should say that it's incomplete and refer back to UCA. UCA section 6.9 refers to UTS #18.
Date/Time: Tue Sep 4 16:21:16 CDT 2012
Contact: daniel.buenzli@erratique.ch
Name: Daniel Bünzli
Report Type: Error Report
Opt Subject: UAX 15 Wrong information about Quick_check and stable code points
Hello, In section 9.1 Stable Code Points of UAX 15. It is said that "characters with the Quick_Check=YES property value satisfy conditions 1-3". Unless I'm completely mistaken this is wrong. For every normal form there is at least one character with Quick_Check=YES and a canonical combining class *different* from 0. Here are examples: U+030D ccc=230 && nfc_quick_check=YES U+0301 ccc=230 && nfd_quick_check=YES U+030D ccc=230 && nfkc_quick_check=YES U+0301 ccc=230 && nfkd_quick_check=YES Best, Daniel
Date/Time: Fri Sep 14 12:27:07 CDT 2012
Contact: greg@chown.ath.cx
Name: Grigori Goronzy
Report Type: Error Report
Opt Subject: Error in description of Hangul decomposition
NOTE: This was handed to the editorial committee for action, but the "PS" was added and sent to UTC.
In chapter 3.12, on pages 109-110 of the 6.1.0 core specification it says for the algorithmic decomposition: > > If the precomposed Hangul syllable s with the index SIndex (defined above) has the > > Hangul_Syllable_Type value LVT, then it has a canonical decomposition mapping into a > > sequence of an LV_Syllable and a T jamo,: > > LVIndex = (SIndex div NCount) * NCount But "LVIndex = (SIndex div TCount) * TCount" is correct (the LV precomposed Hangul forms are TCount codepoints spaced apart). ---------- Thanks. I forgot to include this, here's an example: Consider the codepoint U+AC23. The full LVT decomposition is 1100 1162 11AE. But we actually want to decompose into an LV part and a T part. So we can simply recompose the first two codepoints, and we get the decomposition pair AC1C 11AE. However, the simplified algorithm documented results in AC00 for the first character of the decomposition pair. Best regards Grigori Goronzy
Date/Time: Wed Oct 3 18:54:22 CDT 2012
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Error Report
Opt Subject: UAX #44 6.2 status of Script_Extensions
http://www.unicode.org/reports/tr44/ section 5.7.6 "Similarly, the provisional Script_Extensions property has values which ..." (Please just remove "provisional".) section 5.8 "The provisional property Script_Extensions consists of ..." (Please change to "The Script_Extensions property consists of ...") See the Changes section: "The status of the Script_Extensions property was changed from provisional to informative."
2012/10/02, from Ken Whistler
Rick, This contribution to the unicode list back in June makes a point which was not addressed in the 6.2 versions of UAX #14 and UAX #29. So that this doesn't get lost completely, I suggest that you add it to the other feedback section for consideration at the November UTC meeting. --Ken
Subject: A question about the default grapheme cluster boundaries with U+0020 as the grapheme base
Date: Sat, 2 Jun 2012 07:22:01 +0300
From: Konstantin Ritt ritt.ks@gmail.com
To: unicode@unicode.org
It seems like there is an inconsistency between what the default grapheme clusters specification says and what the test results are expected to be: The UAX#29 says: > Another key feature (of default Unicode grapheme clusters) is that > default Unicode grapheme clusters are atomic units with respect to the > process of determining the Unicode default line, word, and sentence > boundaries. Also this mentioned in UAX#14: > Example 6. Some implementations may wish to tailor the line breaking > algorithm to resolve grapheme clusters according to Unicode Standard Annex > #29, “Unicode Text Segmentation” [UAX29], as a first stage. Generally, > the line breaking algorithm does not create line break opportunities within > default grapheme clusters; therefore such a tailoring would be expected > to produce results that are close to those defined by the default algorithm. > However, if such a tailoring is chosen, characters that are members of line > break class CM but not part of the definition of default grapheme clusters > must still be handled by rules LB9 and LB10, or by some additional > tailoring. However, <U+0020 (SP), U+0308 (CM)> in the line breaking algorithm is handled by the rules LB10+LB18 and produces a break opportunity while GB9 prohibits break between <U+0020 (Other), U+0308 (Entend)>. Section 9.2 "Legacy Support for Space Character as Base for Combining Marks" in UAX#29 clarifies why there is a line break occurs, but the fact that the statements above are false statements and introduce some ambiguility. In case the space character is not a grapheme base anymore the grapheme cluster breaking rules need to be updated. Kind regards, Konstantin