The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 04, 2023, since the previous cumulative document was issued prior to UTC #174 (January 20, 2023).
The links below go directly to open PRIs and to feedback documents for them, as of April 04, 2023.
The links below go to locations in this document for feedback.
Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports
Date/Time: Thu Jan 19 15:09:19 CST 2023
ReportID: ID20230119150919
Name: Lee Collins
Report Type: Error Report
Opt Subject: Unihan_Readings.txt
Found these in v. 15, they've been there for a while U+6AAC kDefinition type of locust oracacia >> type of locust or acacia U+45E3 kDefinition insect of mulberry, insects that damage to the melons >> insect that lives in mulberry trees, insect that damages melons
Date/Time: Sun Mar 26 20:10:57 CDT 2023
ReportID: ID20230326201057
Name: Eiso Chan
Report Type: Error Report
Opt Subject: KP-Sources for U+6138, U+246D8 and U+29A8E
IRGN2608 had been discussed at IRG #60, but the issues have not been solved yet. https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg60/IRGN2608_KP1disunify.pdf TCA, ROK and Mr. Henry Chan provided their feedback comments. In IRGN2605, the discussion record is shown as below. https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg60/IRGN2605MiscEditorialReport.pdf “Due to the lack of contact to DPRK, the editors believed it is inappropriate to disunify KP1 source characters at present.” According to current UCV rules, these cases belong to misunifications, so it is better to consider to add them to Errata list in the coming versions.
Date/Time: Sun Mar 26 14:01:05 CDT 2023
ReportID: ID20230326140105
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Feedback on Bima characters (L2/23-070)
On document https://www.unicode.org/L2/L2023/23070-bima-script.pdf A few new characters are requested for the Buginese script to support the Bima orthography. I would just like to make a few suggestions. 1. Disunify the flower shaped end of section character: It is clear that this punctuation mark is shaped very differently than the existing end of section already encoded, with both signs having very different epigraphic derivations. It is quite likely that users won't see these characters as interchangeable and would like to use a particular glyph without switching font or even want to use them concurrently and/or outside a Bima context. Therefore they should be considered distinct. I recommend using the codepoint 1A1D, with the name BUGINESE FLEURON, or BUGINESE SIGN FLEURON to parallel the similar character 10AF1 𐫱 MANICHAEAN PUNCTUATION FLEURON. A note could be added saying "used as an end of section sign in Bima texts". Along with the Pallawa and the existing end of section, they would be placed under the header "Punctuation". If the codepoint above needs to be occupied, I would recommend placing the Reduplication sign or the Gemination sign, since they are the most likely to see use beyond a Bima context. But it can also be left vacant for a future addition. 2. Disunify the killer above and below: While the author states that these are identical in meaning and use, the difference in placement may be very relevant for rendering the text. Not only epigraphers would like to represent the text as it was written, regardless of semantic distinction, it is generally very problematic for fonts and rendering engines to handle a combining mark that doesn't have a definite position with the base letter. It can result in glitchy rendering, particularly when it has to interact with other marks above or below. One could say that a vowel silencer would never be placed in the same letter as a vowel sign or a gemination sign, but if the Buginese script increases in popularity (as the author presumably wants it to), then more additions are likely, including ones that may be placed with the vowel silencer. The two glyph variants of the sign don't need to be disunified, they can be handled at the font level without issue; but the choice of one glyph over another in the code chart needs some justification. 3. Finally, the name "killer" needs to be reconsidered. While it is unlikely that people would take offense to it, it is suboptimal on its description. Better añternatives may be "Virama" or "Vowel Silencer". 4. Add a note under the Pallawa sign stating "a glyph variant with two dots occurs" or "may have three or two dots". These glyph variants are similar enough that they don't merit disunification. 5. Another modification would be to the one under 1A10, where instead of spelling it /h/, the note would read "used in Bima for ha". In summary my most important suggestions are for the disunification of two more characters (the fleuron sign and the two positional versions of the vowel silencer), as well as adding a note under the Pallawa sign and reconsidering the name of "killer".
Date/Time: Sun Mar 26 14:34:34 CDT 2023
ReportID: ID20230326143434
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Request to revise the glyph for the character 111CF (SHARADA INVERTED CANDRABINDU)
On document L2/17-428 (http://www.unicode.org/L2/L2017/17428-sharada-inv-candrabindu.pdf) a proposal to add the Inverted Candrabindu for Sharada was made and eventually accepted. Unlike other Indic scripts, the regular Candrabindu for the Sharada script (11180) has an upper arc going above the dot, rather than a lower arc below the dot. In effect the Candrabindu is inverted with respect to the usual configuration. But it was brought to the attention of Unicode that a "regular" Candrabindu (inverted in the context of Sharada) was used along side the "inverted" (regular in the context of Sharada) and with different uses. However the author decided to use a glyph with an arc that is shorter than its already existing dual. But an examination of the manuscript attestations in the proposal document, reveals that this isn't a consistent practice due to scribe idiosyncrasies: While figure 1, 2, 3 and six show the preferred glyph, figures 4 and 5 show an almost flat stroke instead of an arc, and figure 7 looks like an ark with an extra vertical stroke rather than a dot (contrasting with the regular one that does use a dot). While the current glyph is fine, it makes one wonder if this discrepancy is important, and if making a font that just rotates the glyph of 11180 is acceptable. This situation is different than the Devanagari script, where the regular (0901) and the inverted version (0900) of the Candrabindu are mere rotations of each other. I suggest contacting Anshuman Pandey and ask for his expert opinion, because if a glyph change is not warranted, an annotation noting the glyphic variants only applicable to the inverted version would be useful.
Date/Time: Mon Jan 23 04:59:25 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS46
Chromium will ship Nontransitional Processing soon: https://chromestatus.com/feature/5105856067141632. That covers all browser engines. I suggest taking that opportunity to simplify this document and its test suite and declare the transition period for which this conditional existed to be over.
Date/Time: Mon Jan 23 05:11:04 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS46
Steps don't always consider that domain labels can be empty, e.g., when CheckBidi is true the first subrule of "The Bidi Rule" inspects the first character of a label. I think that might also apply to CheckJoiners and potentially other steps. (I initially thought the problem here was VerifyDnsLength not being considered, but that check happens much later on in the processing model so it's something more fundamental.)
Date/Time: Mon Jan 23 05:13:16 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS46
Please change U+2260 (≠), U+226E (≮), and U+226F (≯) from disallowed_STD3_valid to valid. These code points are not decomposed so they can never conflict with =, <, and >. And they are not inherently more confusing than any of the other allowed code points, which include hieroglyphics and emoji. These code points also work as-is in all browser engines (while < and > are forbidden) and on balance preference ought to be given to retaining compatibility so end users are not prevented from visiting websites or seeing subresources that might use these code points in their domain for one reason or another. For further background and discussion please see https://github.com/whatwg/url/issues/733. Thank you!
Date/Time: Mon Jan 23 06:35:46 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: IdnaTestV2.txt
I have worked on importing IdnaTestV2.txt into web-platform-tests, the test framework used by all web browsers. The goal was to meet the requirements of the domain to ASCII algorithm specified at https://url.spec.whatwg.org/#idna with beStrict initialized to false. As such, I attempted to filter out ToASCII statuses for UseSTD3ASCIIRules, CheckHyphens, and VerifyDnsLength. Hoping that any statuses that are left would indicate a failure requirement. You can find my work at https://github.com/web-platform-tests/wpt/pull/38080. I ran into the following issues. Most of them relate to status annotation. IPv4 address confusion was the one issue that did not relate to statuses. * VerifyDnsLength is not P4, but rather A4_1 and A4_2. * Tests that use trailing ASCII digit labels (or such a label followed by a dot) are not useful for browsers as that will trigger the IPv4 parser. Which will then usually return failure as the input was not actually an IPv4 address string. This is a problem for a number of the A4_1 and A4_2 tests. And also a large number of tests later on, such as ToASCII ("xn--gl0as212a.8.") or ToASCII("1.27"). I wrote a filter to exclude them, but it would be better if they were adjusted slightly (e.g., made to contain one non-EN code point) so what they aim to test can also be tested in browsers. (Note that the IPv4 parser runs after domain to ASCII, but the web platform doesn't provide a way to invoke domain to ASCII on its own and probably never will.) * The test for ToASCII("$") is marked P1 and V6, not U1. This also effects numerous tests with <, >, and =. If they continue to have multiple statuses that will also make it impossible to filter them in an automated fashion. (This also applies to non-ASCII UseSTD3ASCIIRules code points, but I filed a separate request to remove those.) * NV8 is not used as a status. * A3 and X3 do not appear to be used as a status. (These are catered for by P4 presumably.) * CheckBidi is not V8. V8 does not appear to be used. You'd have to filter out all B1-6 statuses instead.
Date/Time: Mon Feb 13 07:41:09 CST 2023
ReportID: ID20230213074109
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS 46
An issue reported against the URL Standard indicated that the current CheckBidi handling from UTS 46 is rather strict: https://github.com/whatwg/url/issues/543. Namely, domains containing RTL-labels cannot have labels consisting solely of ASCII digits preceding them (such labels are invalid per The Bidi Rule subrule 1). This ends up rejecting a number of domains in the wild and also seems unnecessarily restrictive for RTL users. In that issue I worked with Harald Alvestrand (one of the editors of RFC 5893: Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)) on a specific set of changes for UTS 46 that would remedy this issue, while still imposing the majority of Bidi-related requirements present in UTS 46 today. The proposed changes are: 1. Remove step 8 of https://unicode.org/reports/tr46/#Validity_Criteria as Validity Criteria only operates on a single label. (Although it somehow claims to have knowledge about the domain_name string as well...) 2. Add a new step 5 to https://unicode.org/reports/tr46/#Processing . (Note that due to step 4 we will have U-labels.) The new step 5 would as follows: * If CheckBidi, and the domain_name string is a Bidi domain name, record there was an error if neither of the following conditions is true: * All labels in the domain_name string satisfy the 6 subrules of The Bidi Rule of RFC 5893, Section 2. * RTL labels in the domain_name string are immediately followed by an LDH label whose first code point is not of class EN and all labels in the domain_name string are either LDH labels or satisfy the 6 subrules of The Bidi Rule of RFC 5893, Section 2. Thank you for your consideration. This is probably the final IDNA-related issue from the URL Standard. Once all of them have been resolved I’ll work with browser implementers to ensure the changes (if any) get implemented so we can finally declare victory on IDNA interoperability.
(None at this time.)
Date/Time: Mon Feb 06 11:22:42 CST 2023
ReportID: ID20230206112242
Name: Danny Anderson
Report Type: Membership Inquiry
Opt Subject: Cross-Reference Addition
Could there be a cross-reference mention of "⪇" (0x2A87) in the Unicode Code Charts under the "≨" (0x2268) character? I know that 0x2A87 cross-references 0x2268, but not vice versa. A similar note applies for "⪈" (0x2A88) and "≩" (0x2269).
Date/Time: Fri Feb 17 18:50:30 CST 2023
ReportID: ID20230217185030
Name: Liang Hai
Report Type: Error Report
Opt Subject: Core Spec
Cell of column “Code Point and Name” / row “Mongolian variation selectors” of Table 4-10, Unusual Properties, on page 194 of the Core Spec, version 15.0: > 180B MONGOLIAN FREE VARIATION SELECTOR ONE > 180C MONGOLIAN FREE VARIATION SELECTOR TWO > 180D MONGOLIAN FREE VARIATION SELECTOR THREE > 180E MONGOLIAN VOWEL SEPARATOR Requests: 1. U+180E MONGOLIAN VOWEL SEPARATOR (aka, MVS; gc = Cf/Format) should be removed, as it’s not a variation selector. 2. The recently U+180F encoded MONGOLIAN FREE VARIATION SELECTOR FOUR (FVS4) should be added.
Date/Time: Thu Mar 16 07:04:29 CDT 2023
ReportID: ID20230316070429
Name: Simon Cozens
Report Type: Error Report
Opt Subject: Typo in UTS Figure 11.5
Note: This error has already been fixed in the draft.
In the second example of complex hieroglyph formatting, the character sequence given is 1314A, 13433... U+13433 is EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START, but the intended rendering (both in the image and in the symbolic column) is for the vertically-paired signs to be inserted at TOP END; 13433 should be replaced by 13434.
(None at this time.)