The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of May 1, 2013, since the previous cumulative document was issued prior to UTC #135 (May 2013). This document does not include feedback on moderated Public Review Issues from the forum that have been digested by the forum moderators; those are in separate documents for each of the PRIs. Grayed-out items in the Table of Contents do not have feedback here.
The links below go to directly to open PRIs and to feedback documents for them, as of November 1, 2013.
The links below go to locations in this document for feedback.
Date/Time: Tue Sep 24 17:42:39 CDT 2013
Contact: roozbeh@google.com
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Dandas need more scripts in ScriptExtensions.txt
Currently, the common Indic dandas are listed in ScriptExtensions.txt as: 0964..0965 ; Beng Deva Guru Orya Takr # Po [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA But there is also pointers to the dandas from the following blocks in NamesList.txt: Gujarati, Tamil, Telugu, Kannada, Malayalam. The text of the Core Specification says in section 9.1, under "Punctuation": "the intent is that they be used as common punctuation for all the major scripts of India covered by this chapter. Danda and double danda punctuation marks are not separately encoded for Bengali, Gujarati, and so on." The Gujarati, Tamil, Telugu, Kannada, and Malayalam sections in the core spec also clearly refer to dandas used from the Devanagari block. Apart from that, Limbu (section 10.5) also seems to use the double danda, while Syloti Nagri (10.6) uses both dandas. This means the line in ScriptExtensions.txt needs to change to: 0964 ; Beng Deva Gujr Guru Knda Mlym Orya Sylo Takr Taml Telu # Po DEVANAGARI DANDA 0965 ; Beng Deva Gujr Guru Knda Limb Mlym Orya Sylo Takr Taml Telu # Po DEVANAGARI DOUBLE DANDA We could also probably use pointers in the NamesList from the Limbu and Syloti Nagri blocks to the dandas they use. [The situation of Sinhala is not clear, but we can update that if we find more information.]
Date/Time: Wed Oct 2 17:58:08 CDT 2013
Contact: jshin1987@gmail.com
Name: Jungshik Shin
Report Type: Error Report
Opt Subject: Classification of comma vertical variants are inconsistent for line breaking
[:Line_Break=Close_Punctuation:] has U+FE50 (small comma), U+FE11 (presentation form for vertical ideographic comma), U+FF0C (full width comma) and U+FF64 (half width ideographic comma), but U+FE10 (presentation form for vertical comma) and U+FE51 (small ideographic comma) are NOT included. U+FE10 (presentation form for vertical comma) is LB=Infix_Numeric and U+FE51 (small ideographic comma) is LB=Ideographic. It might make sense for U+FE10 (presentation form for vertical comma) to have LB=Infix_Numeric because the corresponding ASCII comma (non-presentation form) has that, too. However, treating U+FE51 (small ideographic comma) and U+FE11 (presentation form for vertical ideographic comma) differently (the former in LB=Ideographic and the latter in LB=CP) seems not very consistent. This issue was initially reported against CLDR ( http://unicode.org/cldr/trac/ticket/6557 ).
Date/Time: Fri Oct 4 16:01:36 CDT 2013
Contact: jungshik@google.com
Name: Jungshik Shin
Report Type: Error Report
Opt Subject: Case mapping for U+0587
Hello, This is not a bug report per se, but is just to bring an issue we came across about the uppercase of U+0587 at Google to the UTC's attention. U+0587 (Armenian Small letter Ligature ECH YIWN) is currently case-mapped to a sequence of U+0535 (Amernian Capital Letter ECH) and U+0552 (Capital letter YIWN). There's a report from Armenian speakers in Armenia that the latest Armenian orthography as used in Rep. of Armenia uppercases it to a sequence of U+0535 and U+054E ( (Armenian Capital letter VEW). OTOH, Armenian diaspora and "Western Armenian" speakers follow the current Unicode standard. A comment from Google's Armenian speaker: "That form was used in Armenia before "spelling reform of the Armenian language" at the beginning of the 20th century (1922–1924 - according to Wikipedia). There is a variation of Armenian language currently used by Armenian diaspora, who still use the old version. But everyone in Armenia (including official documents and media) are using the new form." Another comment from a linguistics professor at Yerevan : <quote> So I asked this guy http://www.ysu.am/science/en/4Kg4l3vuxYoueJU5nAWSsH9JAT/type/1/page/1 who is friend of mine. His comment was "Ev is a ligature, same as &, and as such it is not a full first class citizen letter and it cannot have a capital. In Eastern Armenian it is usually "ԵՎ" although it is logically wrong, as the ligature is ligature of "եւ". To cut things short - it is illogical and historically incorrect to write ԵՎ in his opinion, but that is the way it is done, so we shall write ԵՎ in Eastern Armenian and ԵՒ in Western. </quote>
Date/Time: Wed Oct 9 04:15:01 CDT 2013
Contact: michel.onoff@web.de
Name: Michel Onoff
Report Type: Error Report
Opt Subject: Underspecifications in UAX #44
I refer to the current UAX #44. The annex lacks a syntax for the property types. For example, does Enumeration (E) resemble a conventional identifier and how about the underscore and case-sensitiveness? What's the syntax for Numeric (N), etc.? Also, fields 6, 7, and 8 of the UnicodeData.txt are composed of a Numeric_Type (E) and a Numeric_Value (N). It is left unspecified how the two are separated, whether the former is optional and so on. The Numeric_Type never appears in the file, so I'm wondering if the provision for it is obsolete or is there for future extensions. Fields 12, 13, 14 provide simple mappings to a single character. It is unspecified that the field shall be in the form of a hex code point. Best regards MO
Date/Time: Thu Oct 10 17:17:14 CDT 2013
Contact: roozbeh@google.com
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Thaana's use of Arabic number punctuation
According to the Core Spec, section 8.5, page 275, under Numerals, "Arabic numeric punctuation is used with digits [in Thaana], whether Arabic or European." It's not very clear what that text means, but I take "Arabic numeric punctuation" to mean: U+066A ARABIC PERCENT SIGN U+066B ARABIC DECIMAL SEPARATOR U+066C ARABIC THOUSANDS SEPARATOR If that is the case and those are indeed used in Thaana, we need to add these three to ScriptExtensions.txt as: 066A..066C ; Arab Thaa If not, we need to clarify what the text means.
Date/Time: Tue Oct 15 12:45:59 CDT 2013
Contact: chris@lookout.net
Name: Chris Weber
Report Type: Error Report
Opt Subject: inconsistent confusables data
Please see this email thread for reference: http://www.unicode.org/mail-arch/unicode-ml/y2013-m10/0028.html The confusables data leaves out certain characters based on the assumption that they would have been removed by way of NFKC normalization. However, I argue that may be a dangerous assumption. Could there be cases where implementations want to detect confusability but cannot guarantee NFKC normalization? In another case, implementations may wish to generate confusable data for testing or other purposes. For example: http://unicode.org/cldr/utility/confusables.jsp?a=m&r=None With certain data missing from the equivalence sets, people who rely on the expertise of the Unicode Consortium may expose their implementations to vulnerability. My ask with this report is that the confusables data be updated to include all characters which have a confusable potential even though they may not fit the profile described in http://www.unicode.org/reports/tr39/#Identifier_Characters. Best regards, Chris Weber
Date/Time: Thu Oct 17 15:35:43 CDT 2013
Contact: duerst@it.aoyama.ac.jp
Name: Martin Dürst
Report Type: Error Report
Opt Subject: Hangul normalization tests (LV + T = LVT) missing
The NormalizationTest file provided on the website (http://www.unicode.org/Public/UCD/latest/ucd/NormalizationTest.txt) seems to be missing one specific kind of pattern for Hangul. There are no tests that start with a "halfway-composed" Hangul syllable, i.e. one that uses a LV Hangul syllable followed by a T Hangul Jamo. In NFD, this LV + T normalizes to L + V + T, which should be covered by the existing test for LV -> L + V. However, in NFC, this should normalize to LVT. There is no test that actually checks this, and there is a potential for errors when working on non-straightforward implementations (i.e. not going to NFC via NFD). This actually happened in an implementation I was working on, and I only discovered the problem through a code walkthough. An example entry in the test file to cover this case (without the comment) would be: AC00 11A8;AC01;1100 1161 11A8;AC01;1100 1161 11A8 There may not be a need to provide tests for all such cases (around 10'000), but even having just a single one will catch some errors that haven't been caught up to now.
Date/Time: Fri Oct 18 17:04:27 CDT 2013
Contact: loren.brichter@gmail.com
Name: Loren Brichter
Report Type: Other Question, Problem, or Feedback
Opt Subject: UAX #9 6.3.0 Bidi Algorithm feedback
In implementing UAX #9 Bidi Algorithm (6.3.0) I encountered a few issues, some of which may be clarified by tweaked wording in the spec. 1. Section 5.2, X9 modifier, "assign the embedding level to each formatting character" and "turn it into "BN". Turning it into BN makes sense, but to what "embedding level" is this referring? They are already at the embedding level that they are at. As these BNs are ignored in subsequent steps, theoretically it doesn't matter what embedding level is assigned, so perhaps this could be removed. 2. BD16: this algorithm makes no mention of a maximum stack depth, which could lead to implementations diverging. I'd love to see it capped at max_depth to keep things simple. 2.5. (Also, I completely skipped over the word "canonical" in BD16 originally — mentioning that would be helpful, and even just including the 2(?) legacy cases would have saved me a bit of time). Thanks, Loren
Date/Time: Mon Oct 21 10:32:03 CDT 2013
Contact: smontagu@smontagu.org
Name: Simon Montagu
Report Type: Error Report
Opt Subject: xidmodification.text needs update for Unicode 6.3
http://www.unicode.org/Public/security/latest/xidmodifications.txt is still the 6.2 version and has not been updated to include changes in 6.3 There are at least two such changes that will affect xidmod: Firstly, U+180E MONGOLIAN VOWEL SEPARATOR should change from "restricted ; not-xid" to "restricted ; default-ignorable". This may not make much practical difference, but more seriously the new U+061C ARABIC LETTER MARK needs to be added to "restricted ; default-ignorable" (The other new Bidi control characters are already there as "reserved")
Date/Time: Mon Oct 28 21:31:18 CDT 2013
Contact: lunde@adobe.com
Name: Ken Lunde
Report Type: Error Report
Opt Subject: U+1F12E (CIRCLED WZ) decomposition
I noticed an inconsistency between the the Code Chart glyph of U+1F12E and its decomposition. Its decomposition is <0057 005A> ("WZ"), but its Code Chart glyph suggests <0057 007A> ("Wz").
RESPONSE FROM KEN Whistler, 2013/10/29: I just ran an extensive back search, and this may have been an error that I made on May 4, 2009, which was never caught during beta review of the data files. The Amd 6 post Dublin chart (L2/09-172) had the correct decomposition to <circle> W z, but there are various anomalies in the process here. The U.S. ballot comments on FPDAM6, which asked for this, L2/09-082, claimed that the decomposition was listed in L2/09-034, Karl Penztlin's proposal document, but it fact it wasn't. Nor was a decomposition explicitly listed in Germany's ballot comments. That means that Michel put the decomposition in himself in the Amd 6 data files. But there seems to be a handoff glitch for Amd 6 data for addition to the draft Unicode 5.2 data I already had lying around containing Amd 5 data. I can't find my copy of the FDAM 6 names list file, which ordinarily I would have archived. Instead I see a UnicodeData delta only, with a manual addition of the decomposition for U+1F12E that I did on May 4, 2009. I would ordinarily get the decompositions from a combination of examination of proposals and examination of the FDAM 6 names list annotation entries. But 4-1/2 years later, I can't recover the exact details of what happened here. My own handwritten UTC notes from February, 2009 are ambiguous about whether the "z" was supposed to be uppercase or lowercase, so that might have been the source of my original error. At any rate, this error was totally missed in the beta review for Unicode 5.2, and it has taken 4 years for somebody to report it as a problem. Not sure whether that deserves a :-) or a :-(
Date/Time: Tue Oct 29 12:49:45 CDT 2013
Contact: ajithramayyan@yahoo.co.in
Name: Ajith R
Report Type: Error Report
Opt Subject: MALAYALAM CONJUNCTS NTA and TTA
I am a native malayalam speaker and wish to point out two errors in malayalam unicode standard 6.3. The standard directs 1) the sequence <0D7B, 0D4D, 0D31> to be rendered as "NTA" ന്റ 2) the sequence <0D31, 0D4D, 0D31> to be rendered as "TTA" റ്റ While on the face, this scheme gives the desired visual result, it is only as correct or wrong as using <0D7B, 0D4D, 0C67> or <0D7B, 0D4D, 0CE7> for "NTA" ന്റ or <0C67, 0D4D, 0C67> or <0CE7, 0D4D, 0CE7> to represent "TTA" റ്റ. The "NTA" ന്റ is actually a combination of MALAYALAM LETTER CHILLU N, 0D7B and MALAYALAM LETTER TTTA, 0D3A, though it is written as chillu n combined with rra, 0D31. It is pronounced similar to the nt of ant. Similray , the "TTA" റ്റ is a duplication of MALAYALAM LETTER TTTA, though it is shown as one rra below the other. It is pronounced similar to the t of bat, but with more stress. The reason for this apparent digraph, where the rra, represents its original sound as well as "ttt", is that MALAYALAM LETTER TTTA is never used singly. It occurs only in these two conjuncts "NTA" ന്റ and "TTA" റ്റ. In native malayalam words, RRA is not duplicated as well. So, the same curved symbol has been used to represent the "TTTA" occuring ion these conjuncts. This fact is described in the book "Samboorna Malayala Vyakaranam" by V Ramkumar , publisher SISO books and in it the author quotes KeralaPanini. My suggestion is 1) "NTA" ന്റ be defined as a precomposed characters that are decomposable to <0D7B, 0D4D, 0D3A> instead of the current suggestion of rendering the sequence <0D7B, 0D4D, 0D31> as "NTA" 2) "TTA" റ്റ be defined as a precomposed characters that are decomposable to <0D3A, 0D4D, 0D3A> instead of the current suggestion of rendering the sequence <0D31, 0D4D, 0D31> as "TTA" ajith
Date/Time: Sun Nov 3 15:49:01 CST 2013
Contact: andrewcwest@gmail.com
Name: Andrew West
Report Type: Error Report
Opt Subject: Character Name for U+2B81
The character name for U+2B81, to be added in Unicode 7.0, has a typo. The actual name in the ISO/IEC 10646:2012 Amd.1 text and the Unicode 7.0 beta files http://www.unicode.org/Public/7.0.0/ucd/UnicodeData-7.0.0d12.txt is: UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS DOWNWARDS OF TRIANGLE-HEADED ARROW This should be: UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED ARROW (cf. U+2B83 "DOWNWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF UPWARDS TRIANGLE-HEADED ARROW") As the actual name is confusing/misleading and makes it difficult for users to find the character in code charts etc. when searching for e.g. "ARROW LEFTWARDS OF", I suggest adding a named alias for U+2B81 when Unicode 7.0 is released. NamesList.txt: 2B81 UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS DOWNWARDS OF TRIANGLE-HEADED ARROW % UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED ARROW NameAliases.txt: 2B81;UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED ARROW;correction
None at this time.