The sections below contain comments received on the open Public Review Issues and other feedback as of May 04, 2012, since the previous cumulative document was issued prior to UTC #130 (February 2012). This document does not include feedback on moderated Public Review Issues from the forum that have been digested by the forum moderators; those are in separate documents for each of the PRIs. Gray items in the Table of Contents do not have feedback here.
182 Proposed Update UTS #18: Unicode Regular Expressions
207 Proposed Draft UTR #50, Unicode Properties for Vertical Text Layout (moderated)
208 Proposed Update UTR #36: Unicode Security Considerations
209 Proposed Update UTS #39: Unicode Security Mechanisms
Feedback on Encoding Proposals
Closed Public Review Issues
Other Reports
Assamese
See also L2/12-162 and L2/12-187
Date/Time: Sun May 6 23:42:09 CDT 2012
Contact: unicode@norbertlindenberg.com
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: Proposed Update UTS #18 has incorrect example
The proposed update for UTS 18, Unicode Regular Expressions, section 1.5, Simple Loose Matches, includes an example showing the expansion of /Dåb/ into /(?:d|D)(?:å|Å|Å)(?:b|B)/ . There's no need to repeat Å in the expansion; I assume that instead Å, or more clearly \u2128, is meant since it also has å as its lower case mapping.
Date/Time: Mon May 7 00:28:43 CDT 2012
Contact: unicode@norbertlindenberg.com
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: Proposed Update UTS #18 is unclear on default case conversion
The proposed update for UTS 18, Unicode Regular Expressions, section 2.4, Default Case Conversion, is not very clear on how full caseless matches are supposed to be handled in different situations. The guidance provided seems to cover only the case of literals within patterns. It's not clear how, say, a class such as /[äöüß]/i should be handled. Full mapping of "ß" results in "SS", but a two-letter string cannot be a member of a set of characters. So, should the "SS" be quietly dropped in this case (as the ICU implementation does)? Or should the range be rewritten as /(?ä|ö|ü|ss)/i ? Going further, should /[a-ß]/i result in an error, or what does it mean?
Date/Time: Mon May 7 11:39:36 CDT 2012
Contact: khw@cpan.org
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: tr18
I was re-reading the draft, and noticed this minor problem that I had overlooked: In section 2.5, it has these: \p{HANGUL SYLLABLE GAG} \p{BEL} \p{BELL} Did you mean to suggest that all character names should be considered properties? I had never noticed anything like this before, and I worry about the possibility of collisions. Perl uses e.g., \N{BELL} to specify character names.
See the relevant forum. One item was received on the reporting form, see below.
Date/Time: Tue Mar 20 17:35:23 CDT 2012
Contact: cowan@ccil.org
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/12-102 Updated Proposal to Revise UTR #50
Note: A reply was already sent to John, pointing him to the forum.
Two issues: The document uses the term "stacked" for horizontal cursive scripts (Arabic, Syriac, etc.) written vertically so as to be read top-to-bottom. This style is different from default vertical positioning, but conflating it with the use of unrotated glyphs in horizontal non-cursive glyphs (Latin, Greek, etc.) is IMHO more confusing than helpful. Something also needs to be said about Ogham in section 4. The tables correctly give it an orientation property of Rotatable-only, but don't mention that it is written bottom-to-top, and therefore Ogham embedded in vertical scripts requires bidi handling in all cases.
Date/Time: Wed Feb 22 21:55:41 CST 2012
Contact: jamadagni@gmail.com
Name: Shriramana Sharma
Report Type: Other Question, Problem, or Feedback
Opt Subject: Telugu confusables
Note: This was already sent to the editorial committee.
I notice in the latest meeting minutes: A.5.2 Action item review. [130-A1] Action Item for Lisa Moore: Follow up with Andhra Pradesh on action 125-A17. [130-A2] Action Item for Eric Muller: Take info for Indic TR and turn into a document for the doc register. Where 125-A17 is: South Asian Subcommittee — TELUGU LENGTH MARK (D.3.1) [125-A17] Action Item for Manoj Jain: Work with Andhra Pradesh Gov't to determine what additional clarifications and annotations may be required for the Telugu script. L2/10-339 [125-A18] Action Item for Eric Muller, Julie Allen, Editorial Committee: Look for cases to be added to the confusable vowel representation tables in the Indic chapter(s) for Unicode 6.0. Look at document L2/10-339 Telugu, and other cases where documentation could be improved. Since I was the one who submitted the document L2/10-339 requesting deprecation of Telugu Length Mark, let me just give the list of confusables I had in mind. VS-II ీ = VS-I ి + LM ౕ VS-EE ే = VS-E ె + LM ౕ VS-OO ో = VS-O ొ + LM ౕ HA హ VS-AA ా -> HAA హా = HA హ LM ౕ (VS = vowel sign; LM = length mark) The people with the Action Item can incorporate this into what they write. [Submitted via the form as per offlist suggestion of Markus Scherer to ensure it doesn't get forgotten.]
No feedback at this time.
Date/Time: Tue May 1 21:51:47 CDT 2012
Contact: cowan@ccil.org
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/12-123 Proposal to Encode the Sign SIDDHAM for Devanagari
Since this sign is the same in form and function as its Tibetan look-alike U+0FD3, I think the two should be unified, provided that the Tibetan sign actually means the same thing (I can't find information about this). It's a little strange to incorporate a Tibetan character into Devanagari fonts, but it does not seem to require any special Tibetan support. U+0FD3 is Po rather than So, but as we know that is not a hard and fast distinction.
Date/Time: Tue May 1 21:57:13 CDT 2012
Contact: cowan@ccil.org
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/12-124 Proposal to Encode Signs for Writing Kashmiri in Sharada
Since the proposed SHARADA SIGN NUKTA has exactly the same form, function, and properties as the Devanagari version, I think unification should be strongly considered. In the words of the proposal, "these signs were used by Kashmiri scribes in both Sharada and Devanagari", which implies that the Sharada sign is borrowed from Devanagari. In general, when a character is borrowed from a related script, we don't double-encode it unless its range of forms in the borrowing script are outside the bounds of the lending script, as with the Kurdish Q. The other two marks also have Devanagari look-alikes, but clearly don't share function with them, so they should be encoded.
Date/Time: Wed May 2 10:10:24 CDT 2012
Contact: cowan@ccil.org
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/12-011 Preliminary Proposal to Encode Siddham in ISO/IEC 10646
The same issue I raised about DEVANAGARI SIGN SIDDHAM applies here also: this should be unified with Tibetan U+0FD3, provided the semantics is the same. Failing that it should at least be unified with the Devanagari sign, since there is plenty of precedent for sharing Devanagari punctuation/symbols with other Indic scripts.
Date/Time: Tue Feb 7 14:10:56 CST 2012
Contact: unicode@farah.cl
Name: Miguel Farah
Report Type: Error Report
Opt Subject: Clarifications suggested for the DOLLAR SIGN and PESO SIGN code points.
Note: This was already sent to the editorial committee.
I'd like to suggest the following clarifications in the Unicode Names List: 1) To avoid confusion between the Latin-American Peso currencies and the Filipino currency, add an alias "Filipino Peso Sign" to U+20B1. 2) Modify the comment for the U+20B1 code point to state something like "Extant and discontinued Latin-American Peso currencies (Mexican, Chilean, Colombian, Dominican, etc.) use the dollar sign.". 3) Change the spelling from "milreis" to "milréis" in the informative aliases for U+0024. 4) Add a comment to U+0024 along the lines of "The dollar symbol is used for many peso currencies in Latin America and elsewhere, except U+20B1, which is used for the Philippine peso.". For rationale and background for this request, please see the Unicode Forum Discussion at http://www.unicode.org/forum/viewtopic.php?f=21&t=261 . Please use the provided background information to also add to the description in Chapter 15, Currency Symbols, where neither Dollar nor Peso (Philippine) are currently discussed explicitly today, while Yen/Yuan is. Thank you.
Date/Time: Mon Feb 27 18:39:27 CST 2012
Contact: roozbeh@google.com
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: U+0342 Combining Greek Perispomeni needs more info
Note: This was already sent to the editorial committee.
I was looking at the charts, just discovering U+0342 COMBINING GREEK PERISPOMENI. It really confused me, thinking a glyph error has found its in the charts. I think it would be a good idea if some minor explanation is added to the NamesList, together with a reference to U+0303 COMBINING TILDE.
Date/Time: Thu Mar 8 09:59:07 CST 2012
Contact: loic.etienne@tech.swisssign.com
Name: Loïc Etienne
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Annex #15: Function composition rules
Note: This was already sent to the editorial committee.
http://unicode.org/reports/tr15/ , 7 Design Goals, 7.2 Stability could state explicitly: Compatibility NF is stronger than canonical NF: * toNFC(toNFKC(x)) = toNFKC(x) * toNFD(toNFKD(x)) = toNFKD(x) More generally, compatibility is absorbing: * toNFC(toNFKD(x)) = toNFKC(x) * toNFD(toNFKC(x)) = toNFKD(x) * toNFKC(x) = toNFKC(toXXX(x)) * toNFKD(x) = toNFKD(toXXX(x)) where toXXX is any of toNFD, toNFKD, toNFC, toNFKC.
Date/Time: Fri Mar 30 17:11:10 CDT 2012
Contact: fantasai.lists@inkedblade.net
Name:
Report Type: Error Report
Opt Subject: Turkish casing applies also to chr/tt/ba
Mozilla received a report that the Turkish casing rules also apply to Crimean Tatar (crh), Volga Tatar (tt), and Bashkir (ba): https://bugzilla.mozilla.org/show_bug.cgi?id=231162#c17 If so, the Unicode SpecialCasing.txt file needs updating.
Date/Time: Tue Apr 3 16:52:11 CDT 2012
Contact: petercon@microsoft.com
Name: Peter Constable
Report Type: Error Report
Opt Subject: EastAsianWidth properties for new Hangul jamo
Note: This was already sent to the editorial committee.
When new Hangul characters were added in Unicode 5.2, it appears that they were all given an EastAsianWidth property value of W. This is the case regardless of the type of jamo. But that is not consistent with properties that were assigned to jamo that predate TUS 5.2: choseong characters (1100..1159) were given a width value W, but jungseong (1160..11A2) and jongseong (11A8..11F9) were given a width value N. Thus, all of the newer jungseong and jongseong characters have different width values than the older jungseong and jongseong characters. Unless there was a specific reason for setting these characters to W, I suggest that the following have their East Asian Width values set to N: 11A3..11A7, 11FA..11FF, D7B0..D7FB.
Date/Time: Sat Apr 28 16:33:02 CDT 2012
Contact: richard.wordingham@ntlworld.com
Name: Richard Wordingham
Report Type: Other Question, Problem, or Feedback
Opt Subject: Storage Order of Decimal Digits
There is no declared policy on the storage sequence of decimal digits, i.e. characters with general category Nd. What is currently done could be summed up as: 'The Bidi class of decimal digits shall be such that a sequence of digits from the same set of 10 contiguous character points shall be stored in order of decreasing significance when representing a number'. This could be included in the stability guarantee at http://www.unicode.org/policies/property_value_stability_table.html . At present, all decimal digits have the Bidi class EN, AN or L except for the N'ko decimal digits, which have the Bidi class R. If this principal were violated, a 'simplistic parser' could misinterpret values of digit sequences. (Not that it would be likely to get the prime number 25₁₆ right either!) The guarantee, converted to a statement of practice, could reasonably be included in the TUS section on 'Numeric Value', currently Section 4.6. It would be good to say there that this principle is and will generally be followed for characters that primarily function similarly to 'decimal digits', e.g. for other radices or for derived characters such as superscript numerals. (The word 'primarily' allows the principle to be ignored for letters also used as digits.)
Date/Time: Wed May 2 13:52:23 CDT 2012
Contact: unicode@norbertlindenberg.com
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: UTS 10: Case level should be between primary and secondary level
Section 5.1, Parametric Tailoring, of UTS 10 describes caseLevel as "If set to on, a level consisting only of case characteristics will be inserted in front of tertiary level. To ignore accents but take cases into account, set strength to primary and case level to on." I think "in front of tertiary level" should really be "between primary and secondary level". "In front of tertiary level" is normally interpreted as "between secondary and tertiary level", but then it would still distinguish based on accents.
I reported the same issue against UTS 35 as http://unicode.org/cldr/trac/ticket/4698
Date/Time: Sat May 5 09:41:27 CDT 2012
Contact: kent.karlsson14@telia.com
Name: Kent Karlsson
Report Type: Error Report
Opt Subject: Numeric values for Cuneiform digits
Numeric values for Cuneiform digits (or digit parts). For some Cuneiform characters the numeric value is missing or wrong. Here is a list of proposed corrections. Note that the character for 20 is currently missing in Unicode. 12079;CUNEIFORM SIGN DISH;Lo;0;L;;;;;N;;;;; -> 1 1222B;CUNEIFORM SIGN MIN;Lo;0;L;;;;;N;;;;; -> 2 1230B;CUNEIFORM SIGN U;Lo;0;L;;;;;N;;;;; -> 10 ...20 1230D;CUNEIFORM SIGN U U U;Lo;0;L;;;;;N;;;;; -> 30 1240F;CUNEIFORM NUMERIC SIGN FOUR U;Nl;0;L;;;;4;N;;;;; 4 -> 40 12410;CUNEIFORM NUMERIC SIGN FIVE U;Nl;0;L;;;;5;N;;;;; 5 -> 50 12411;CUNEIFORM NUMERIC SIGN SIX U;Nl;0;L;;;;6;N;;;;; 6 -> 60 12412;CUNEIFORM NUMERIC SIGN SEVEN U;Nl;0;L;;;;7;N;;;;; 7 -> 70 12413;CUNEIFORM NUMERIC SIGN EIGHT U;Nl;0;L;;;;8;N;;;;; 8 -> 80 12414;CUNEIFORM NUMERIC SIGN NINE U;Nl;0;L;;;;9;N;;;;; 9 -> 90
There was also a notion, and glyph(s), for the digit 0 (even if not the concept 0). See http://www.jstor.org/discover/10.2307/593904?uid=3738984&uid=2129&uid=2&uid=70&uid=4&sid=21100772121331, http://en.wikipedia.org/wiki/Babylonian_numerals#Numerals, http://gwydir.demon.co.uk/jo/numbers/babylon/index.htm. I don't dare a guess, here, as to which character(s), if any of the currently encoded ones, that is/are.
Date/Time: Sun Apr 1 22:48:13 CDT 2012
Contact: azihaque@yahoo.co.in
Name: Aziz-ul Haque
Report Type: Error Report
Opt Subject: Place of Assamese
Note: This was already sent to the editorial committee.
Dear Sir/Madam Would you please inform me about the latest position of Assamese writing system in Unicode? Earlier the Unicode said,Bengali script is used in writing Assamese. We disagree, since we have our own script that has a history of 1500 years and from which developed Bengali and Maithili. Moreover, at least 15 characters of Assamese are different from modern Bengali. With all documentary evidences and our state government's approval we have been requesting the Unicode to provide a separate slot for Assamese. Sincerely yours A. Haque
Note: This was already sent to the editorial committee.
Date/Time: Sat May 5 09:17:02 CDT 2012
Contact: ashok2001sarma@rediffmail.com
Name: Ashok Sarma
Report Type: Other Question, Problem, or Feedback
Opt Subject: Each and Every version carries wrong information regarding assamese scripts
Sir/Madam, With due respect again I inform you that assamese script is not Bengali script. Historically also,the typeset prepared by British was sampled from assamese manuscript.Again,the oldest written form of assamese script was found in "Charyapad".The language of "Charyapad" is Kamrupi. Even the book on Origin of Bangla Script was written collecting the inscription,manuscripts of assamese writings. Then why your consortium repeated the same mistake hurting the self esteem of assamese people. If you need scientific proofs in support of special identity of assamese script, please let us know the way to establish the truth. I respect your consortium and I understand also the importance of your consortium. But I never want being an assamese person any wrong information in your version underestimating any assamese scrips and language. I am eagerly looking forward for your valuable suggestion for not hurting the sentiment of assamese people further.