The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 20, 2020, since the previous cumulative document was issued prior to UTC #162 (January 2020).
The links below go directly to open PRIs and to feedback documents for them, as of April 20, 2020, 2020.
Issue Name Feedback Link 408 QID Emoji (feedback) 404 Proposed Update UTS #18, Unicode Regular Expressions (feedback)
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to ucd-dev ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
Note: The sections of Feedback this time include links to the following
documents:
L2/18-319,
L2/19-047,
L2/19-167,
L2/19-283,
L2/20-061,
Date/Time: Mon Feb 17 07:38:13 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Radical for U+80E7 胧
The current radical for U+80E7 胧 is #130, but it's G-Source reference value is G0-6B4A, that means it's the simplified variant form of U+6727 朧 which the G-Source reference value is G1-6B4A and the radical is #74, not U+268AB . Is it better to update the radical for U+80E7 胧 to #74?
Date/Time: Mon Feb 17 07:38:54 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Radical for U+4E80 亀
The current radical for U+4E80 亀 is #5, and in ISO/IEC DIS 10646-1.2:1992, the references are T:E-396C and J:0-3535, and there is also the EACC source, 2D632D, in Unicode 1.0. It's the variant of U+9F9C 龜. Could we add the second radical as #213 under U+4E80 亀? Cf. https://mojikiban.ipa.go.jp/search/detail/MJ006424 Notice that the radical for U+2A6C9 is #213, U+2A6C9 and U+4E80 亀 are similar.
Date/Time: Mon Feb 17 11:34:50 CST 2020
Name: Lee Collins
Report Type: Public Review Issue
Opt Subject: Wrong Japanese reading in Unihan
U+5807 shows the Kun reading "Sumire". This is no doubt a mistake for the similar but different character U+83EB. The respective DKWZ codes are 05212 and 31207. U+5807 should have the Kun readings NEBATUTI, NURU, WAZUKA, etc. The On readings are KIN and GON
Date/Time: Wed Feb 19 02:51:51 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Corrections for UAX #45
In the current USourceData file, the data for UTC-00550 is shown as below. UTC-00550;UTC-02637;;30.11;0206.201;⿰口笪;kCheungBauerIndex 375.06; However, UTC-02637 has been updated to UK-02637, so Field 1 should be updated to UK-02637 correspondingly as below. UTC-00550;UK-02637;;30.11;0206.201;⿰口笪;kCheungBauerIndex 375.06;
Date/Time: Wed Feb 19 05:38:12 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Corrections for UAX #45
The data for UTC-00749 should be updated as below. UTC-00749;UK-02870;;157.9;1230.291;⿰⻊某;kCheungBauerIndex 463.06; UTC-00749 and UK-02870 are duplicates.
Date/Time: Fri Mar 6 05:52:34 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: IDSes for UTC-00475 and UTC-00476 in UAX #45
The IDSes for UTC-00475 and UTC-00476 should be updated as below, UTC-00475;V;U+947D;167.15;1326.121;⿰金⿱⿰夬夬貝;kLau 1580; UTC-00476;U;U+947D;167.17;1327.051;⿰金⿱⿰失失貝;kLau 1581;
Date/Time: Tue Jan 28 18:57:28 CST 2020
Name: Sarabveer Singh
Report Type: Feedback on an Encoding Proposal
Opt Subject: Suggestions for Gurmukhi Bindi Before Bihari (L2/18-319, L2/19-167, L2/19-283)
Singh in L2/18-319 and L2/19-167 wishes for the Unicode specification to add support for GURMUKHI SIGN BINDI and GURMUKHI TIPPI to display before GURMUKHI VOWEL SIGN II. As noted in L2/19-047, this combination is most likely to be a stylistic difference. However, this combination should be supported as a stylistic option in Unicode fonts. In testing, I have only found the "liga" and/or the "rlig" OpenType lookups display this stylistic combination. This is an unsupported method and does not work universally on different systems. In my experience, the ligature displays correctly in the major web browsers (Google Chrome, Mozilla Firefox, Apple Safari), but they do not display correctly in Microsoft's software (Office, Edge, Internet Explorer). I request that a OpenType Ligature Lookup Table be recommend to implement this stylistic combination in Unicode fonts, such as the "abvf" Lookup Table.
Date/Time: Sat Feb 1 17:56:25 CST 2020
Name: Doug Ewell
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/20-061, Final Proposal to encode Western Cham in the UCS
L2/20-061 proposes, among other characters, a group of eight characters for Western Cham lunar month names (ARABIC SYMBOL ONE DOT LUNAR MONTH through ARABIC SYMBOL SEVEN DOTS LUNAR MONTH), to be placed in the Arabic Mathematical Alphabetic Symbols block at code points U+1EEF8 through U+1EEFF. The Arabic Mathematical Alphabetic Symbols block was intended for stylistic variations of existing Arabic letters, to be used in special mathematical contexts. It is analogous to the Mathematical Alphanumeric Symbols block for existing Latin and Greek letters and digits. It is not intended for encoding of new “normal” characters. The proposed characters are “special” in that they are used only in Western Cham and only for lunar month names, but they are not “mathematical”; they are not used to represent variables, constants, sets, etc. in mathematical expressions. Both the text and the proposed Unicode properties show that the proposed characters are not stylistic variations of existing Arabic letters, and do not follow the pattern of other characters in this block: 1EEF8;ARABIC SYMBOL ONE DOT LUNAR MONTH;So;0;ON;;;;;N;;;;; cf. 1EE00;ARABIC MATHEMATICAL ALEF;Lo;0;AL;<font> 0627;;;;N;;;;; They are “symbols” (So), not “letters” (AL), and are not <font> varieties of existing letters. In the revision history, it was noted that these characters were moved in Revision 3 (November 2019) from the proposed Western Cham block to this block. Item 6 in the section “Repertoire” includes an inadvertent lingering reference to ARABIC SYMBOL SEVEN DOTS LUNAR MONTH being encoded at U+1E26F. I recommend moving these eight symbols back into the proposed Western Cham block, as they were before Revision 3. I have no objection at all to encoding these symbols, only to this particular proposed location.
Date/Time: Sun Feb 16 15:29:46 CST 2020
Name: Arnim Sauerbier
Report Type: Feedback on an Encoding Proposal
Opt Subject: Symbols for Legacy Computing 1FB3C...
The current proposal for "Symbols for Legacy Computing" unfortunatly lacks two-character full-triangles. Recreating the 'square font' diagonal triangles for legacy computing on modern systems require double-width triangles. The extant proposal only allows creation of triple-width triangles, forming a non-square aspect ratio. The following additions would allow drawing roughly 1:1 bilateral triangles out of two adjacent characters. A two-glyph triangle on bottom left, consisting of: LOWER LEFT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT LOWER LEFT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT A two-glyph triangle on bottom right, consisting of: LOWER RIGHT BLOCK DIAGONAL BOTTOM LEFT TO CENTER RIGHT LOWER RIGHT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT A two-glyph triangle on upper left, consisting of: UPPER LEFT BLOCK DIAGONAL LOWER LEFT TO CENTER RIGHT UPPER LEFT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT A two-glyph triangle on upper right, consisting of: UPPER RIGHT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT UPPER RIGHT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT These new codepoints are necessary to recreate Legacy Computing Graphics. I have been using PETSCII and other legacy-computing graphics since 1981, and am happy to answer any questions you may have. Thank you for your consideration in preserving legacy computer art. Arnim
Date/Time: Sun Feb 16 16:07:01 CST 2020
Name: Jean Larapiere
Report Type: Error Report
Opt Subject: Problem with "Symbols for Legacy Computing"
The current proposal for "Symbols for Legacy Computing" unfortunately lacks two-character full-triangles. Recreating the 'square font' diagonal triangles for legacy computing on modern systems require double-width triangles. The extant proposal only allows creation of triple-width triangles, forming a non-square aspect ratio. The following additions would allow drawing roughly 1:1 bilateral triangles out of two adjacent characters. A two-glyph triangle on bottom left, consisting of: LOWER LEFT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT LOWER LEFT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT A two-glyph triangle on bottom right, consisting of: LOWER RIGHT BLOCK DIAGONAL BOTTOM LEFT TO CENTER RIGHT LOWER RIGHT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT A two-glyph triangle on upper left, consisting of: UPPER LEFT BLOCK DIAGONAL LOWER LEFT TO CENTER RIGHT UPPER LEFT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT A two-glyph triangle on upper right, consisting of: UPPER RIGHT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT UPPER RIGHT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT These new codepoints are necessary to recreate Legacy Computing text-graphics. Thank you,
Date/Time: Thu Feb 20 12:45:44 CST 2020
Name: Markus Scherer
Report Type: Error Report
Opt Subject: review Script of U+16FE3 OLD CHINESE ITERATION MARK
For consideration by script ad hoc & UTC Unicode 13 adds U+16FF0/1 Vietnamese reading marks with sc=Hani (and gc=Mc). In the same block is U+16FE3 OLD CHINESE ITERATION MARK with sc=Zyyy (and gc=Lm). In discussion, Ken W. said that this one "patterns similarly to the modern iteration mark 3005. That one *is* sc=Han." and "So it would seem reasonable to me (in a *future* version) to ask for 16FE3 to change to sc=Han" Please review.
Date/Time: Wed Apr 8 00:37:28 CDT 2020
Name: Barun Kumar Sahu
Report Type: Error Report
Opt Subject: "Devanagari sign avagraha" followed by "Devanagari sign anusvara" or "Devanagari sign candrabindu"
Ideally, "Devanagari sign avagraha" (U+093D) can be followed by "Devanagari sign anusvara" (U+0902) or "Devanagari sign candrabindu" (U+0901). However, some word-processors do not accept this combination. My question is: Can "Devanagari sign avagraha" be followed by "Devanagari sign anusvara" or "Devanagari sign candrabindu" as per the Unicode Standard? (I think it should be allowed. For example, we should be able to write कऽंप or कऽँप.)
Date/Time: Thu Jan 9 21:21:51 CST 2020
Name: Alex Henrie
Report Type: Other Question, Problem, or Feedback
Opt Subject: Lack of precomposed capital Greek letters complicates lowercasing, uppercasing, and normalizing Greek text
Unicode defines precombined characters for various lowercase Greek letters with diacritics, but not their uppercase forms.[1] However, this can cause Greek texts encoded in NFC to no longer be NFC-normalized after changing case: For example, if "Ρ̓ᾶρος" (the name of an ancient Greek hero) is converted to lowercase, its first character changes from 03A1 0313 to 03C1 0313 and must be normalized again to get to 1FE5. The lack of capital characters creates other complications as well, such as breaking any uppercasing or lowercasing algorithm that does not allow changing the length of the string. Would you please reconsider including these characters in the standard so that Greek NFC text does not need to be renormalized after lowercasing? Or at least add a note about this problem to the Greek Language FAQ?[2] [1] https://www.opoudjis.net/unicode/unicode_gaps.html#gaps [2] https://www.unicode.org/faq/greek.html
Date/Time: Wed Feb 19 16:07:42 CST 2020
Name: Karl Williamson
Report Type: Error Report
Opt Subject: Request uniform version syntax
This isn't an error, but it is an annoyance that the data files you furnish have at least three different syntaxes for specifying the versions they apply to: Files in the UCD have the version embedded in the first line of the file Files in the security subdirectory have a separate line like 'Version: 13.0.0' And EmojiData.txt has a line 'Version: 13.0'. There really is no need to have disparate syntaxes, and it means code reading them has to have extra intelligence.
Date/Time: Tue Mar 3 16:17:10 CST 2020
Name: Daniel Bünzli
Report Type: Error Report
Opt Subject: UAX #14 for 13.0.0: LB27 first's line is obsolete
Hello, I think (more precisely my compiler thinks [1]) the first line of LB27 is already handled by the new LB22 rule and can be removed. Best, Daniel [1] File "uuseg_line_break.ml", line 206, characters 38-40: 206 | | (* LB27 *) _, (JL|JV|JT|H2|H3), (IN|PO) -> no_boundary s ^^ Warning 12: this sub-pattern is unused. [Filed by Rick on behalf of user, per KW. We can delete this if original poster submits it.]
Date/Time: Sun Mar 8 10:50:59 CDT 2020
Name: Zack Newman
Report Type: Error Report
Opt Subject: Mistake in section 6.2 of UAX #29
I'm unsure if this is a mistake in sections 3.1.1 and 4.1.1 or section 6.2, but 6.2 incorrectly states "ignoring Extend is sufficient to disallow breaking within a grapheme cluster". The sequence of Unicode scalar values (U+0600, U+0020) is considered a single grapheme cluster due to rule GB9, but the sequence is parsed into two words according to 4.1.1. While it would be ideal to not have sequences of Unicode scalar values that can be parsed into more words than grapheme clusters, I think it's OK for that property to not hold as long as there are no incorrect claims that it does hold like there currently is in section 6.2.
Date/Time: Wed Apr 1 17:29:56 CDT 2020
Name: Elika J. Etemad
Report Type: Error Report
Opt Subject: Zero Width Space vs Arabic shaping: non-interop
There was some discussion in the W3C, triggered by some new test cases, about whether ZWSP should break Arabic shaping, given spaces generally break shaping. We found that Unicode clearly defines it as not breaking shaping, but also found that Unicode's behavior does not seem to be widely implemented, see [1]. The question to the UTC is, therefore, should ZWSP continue to be defined as transparent wrt shaping, or should its definition be adjusted to match what appears to be the current implementation reality? [1] https://github.com/w3c/csswg-drafts/issues/3861#issuecomment-529348086 [Fwiw, a number of participants in the discussion initially expected that ZWSP would break shaping, just like all the other "space" characters. So given that expectation plus the state of implementations, it might actually make sense to spec this behavior and introduce a new character, if needed, for an explicit break opportunity that does not break shaping.]
(None at this time.)
Date/Time: Thu Apr 16 00:50:43 CDT 2020
Name: Bogdan
Report Type: Error Report
Opt Subject:
Found an error in UnicodeStandard-13.0.pdf - on a page 642 (16.1 Thai) """ In particular, when used as a consonant diacritic, U+0331 combining macron below can occur with vowel signs U+0338 THAI CHARACTER SARA U or U+0339 THAI CHARACTER SARA UU. """ There are 2 typos - wrong code for THAI CHARACTER SARA U - should be U+0e38 (instead of U+0338) and wrong code for THAI CHARACTER SARA UU - should be U+0e39 (instead of U+0339)
(None at this time.)