The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 24, 2022, since the previous cumulative document was issued prior to UTC #172 (July 11, 2022).
The links below go directly to open PRIs and to feedback documents for them, as of October 24, 2022.
Issue Name Feedback Link 458 Proposed Update UTR #17, Unicode Character Encoding Model (feedback) No feedback at this time 459 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time
The links below go to locations in this document for feedback.
Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports
Date/Time: Tue Jul 19 20:50:44 CDT 2022
Name: Eiso Chan
Report Type: Error Report
Opt Subject: New entry for UTN #43
U+20B9A 𠮚 should be tagged as B for BOPOMOFO LETTER R U+3116 ㄖ. U+20B9A 𠮚 is not a common character for the modern CJKV people.
Date/Time: Sun Jul 31 20:08:08 CDT 2022
Name: Eiso Chan
Report Type: Error Report
Opt Subject: kMandarin value for U+3D65
The current kMandarin value for U+3D65 㵥 is bì. Kangxi Dictionary shows 覓畢切, and the pronunciation is same as 密, and it is the variant of U+3D35 㴵. Hanyu Dazidian shows the similar information. Hanyu Dazidian also shows the Putonghua reading for 㴵 is mì, which is the same in Unihan Database. One of my friend uses 㵥 in her name, and she told me the reading for 㵥 is mì in her name to follow Kangxi Dictionary. It is better to update the kMandarin value for U+3D65 㵥 to mì.
Date/Time: Mon Aug 8 07:25:17 CDT 2022
Name: Andrew West
Report Type: Error Report
Opt Subject: CJK Unified Ideographs code chart
In the Unicode 15.0 beta code charts, UTC-00355 (⿰㫫頁) is mapped to U+9855 顕 (⿰显頁). It should be mapped to U+29530 𩔰 (⿰㫫頁).
Date/Time: Tue Aug 9 05:28:31 CDT 2022
Name: Andrew West
Report Type: Error Report
Opt Subject: Unihan_IRGSources.txt (15.0)
The kRSUnicode value for U+31D40 (⿰牜磨) is 112.15 (i.e. 石 radical), but this is unintuitive, and makes the character hard to find. Please add an additional kRSUnicode value of 93.16 (i.e. 牛 radical).
Date/Time: Tue Aug 9 06:18:28 CDT 2022
Name: Andrew West
Report Type: Error Report
Opt Subject: Unihan_IRGSources.txt (15.0)
U+31DBF (⿰氵穿) has a kRSUnicode value of 116.7 (i.e. 穴 radical). This is unintuitive, and makes the character hard to find. Please add an additional kRSUnicode value of 85.9 (i.e. 水 radical).
Date/Time: Mon Sep 5 08:44:48 CDT 2022
Name: Huáng Jùnliàng
Report Type: Error Report
Opt Subject: UniHan.zip/Unihan_Readings.txt
Currently, the kMandarin of U+2277B 𢝻 is hōng. However, according to GHZ pp. 2495 (https://homeinmists.ilotus.org/hd/hydzd3.php?st=page_no&kw=2495), 𢝻 is a variant of 惚, so the kMandarin should be hū. The reading hū is also supported by CNS11643: https://www.cns11643.gov.tw/wordView.jsp?ID=672838
Date/Time: Thu Oct 27 16:18:28 CDT 2022
Name: Michel Mariani
Report Type: Other Document Submission
Opt Subject: Name of the fifth new Ideographic Description Character
To be considered by the UTC when they meet next week: I had a quick look at the recently released document: "CJK & Unihan Group Recommendations for UTC #172 Meeting" <https://www.unicode.org/L2/L2022/22247-cjk-unihan-group-utc173.pdf> and I noticed that the Unicode name for the proposed fifth IDC character (subtraction) is back to "IDEOGRAPHIC DESCRIPTION CHARACTER *STROKE* SUBTRACTION" (possibly because it is planned to be located at the end of the "CJK Strokes" block), after being briefly renamed "IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION" in <https://www.unicode.org/L2/L2021/21173r-cjk-unihan-group-utc169.pdf>: > There was general agreement that the five IDCs (Ideographic Description Characters) in the preliminary proposal are useful and should be considered for encoding after the formal proposal has been submitted, but that the one named IDEOGRAPHIC DESCRIPTION CHARACTER STROKE SUBTRACTION should be renamed IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION (the word STROKE is removed) and should therefore allow components to be subtracted in addition to strokes. Then, the character is mentioned as "IDEOGRAPHIC DESCRIPTION CHARACTER *COMPONENT* SUBTRACTION" in <https://www.unicode.org/L2/L2022/22191-five-new-idc-chars.pdf>. I would like to point out a few issues with the latest suggested name: - This new IDC character is already in use in practice, mainly in the IDS.TXT data file maintained by Andrew West, and it is currently already used to indicate a *component* subtraction, which gives far more flexibility, even if a component can sometimes be made of only one CJK stroke (CJK strokes being allowed in any component)... - This new IDC character should be consistent with all other ones, which deal with *components*, and deciding of a name related to *strokes* is IMO too restrictive and somehow disconcerting... - This IDC character could be also used in non-CJK ideographic contexts; such as Tangut, etc., and others yet to be defined, and so it should be as general as possible for future use... --Michel Mariani
Date/Time: Wed Sep 14 09:07:32 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Relative order of U+A802 and U+A823
L2/02-388 says “The correct encoded representation for this diphthong follows the phonological ordering: < Syloti Nagri dependent a, Syloti Nagri dvisvara sign >”. U+A802 SYLOTI NAGRI SIGN DVISVARA has Indic_Positional_Category=Top. U+A823 SYLOTI NAGRI VOWEL SIGN A has Indic_Positional_Category=Right. The usual order of Indic vowel signs in Unicode is left, top, bottom, right. Therefore, it seems like U+A802 should actually precede U+A823, but on the other hand Unicode often orders marks phonetically, so maybe U+A823 should precede U+A802. Which order should Syloti Nagri text use? The standard should explicitly explain which order to use.
Date/Time: Tue Jul 26 06:12:36 CDT 2022
Name: Oliver Kuederle
Report Type: Error Report
Opt Subject: UAX #29, 14.0.0
In Unicode Standard Annex #29 (Unicode Text Segmentation), v14.0.0, there appears to be an inconsistency between the grapheme cluster boundary rules and the word boundary rules. Specifically, rule GB13 states that a pair of regional indicators may not be broken. If a zero-width joiner precedes a regional indicator, this matches [^RI] and the counting of RI thus starts again. There is no exception for ZWJ in this specific case. For word boundaries, however, rule WB4 will cause an RI before a ZWJ to maintain its count (WB15/WB16). So the following sequence will break differently for graphemes and for words: RI ZWJ RI RI Following the grapheme rules, this will lead to: RI × ZWJ ÷ RI × RI And for word rules, this will lead to: RI × ZWJ × RI ÷ RI The word rules will therefore break a grapheme cluster which is probably not intended.
Date/Time: Mon Aug 22 14:40:56 CDT 2022
Name: Charlotte Buff
Report Type: Error Report
Opt Subject: Line break class of U+1342F
U+1342F EGYPTIAN HIEROGLYPH V011D currently has Line_Break=Alphabetic (AL) in the preliminary data files for Unicode 15. Because this hieroglyph is the start of a cartouche, it should have Line_Break=Open_Punctuation (OP) instead. This property value is shared by all other hieroglyphs with a similar function (U+13258..U+1325A, U+13286, U+13288, U+13379).
Date/Time: Thu Sep 8 15:38:26 CDT 2022
Name: Asmus/
Report Type: Website Problem
Opt Subject:
https://www.unicode.org/policies/stability_policy.html This page should cite definitions of terms such as "domain". This could be done either by citing the location of their formal definition of, perhaps better by making them glossary links and then ensuring that any glossary item always cites the formal definition its based on. This came up in the context of adding the "domain stability" which introduces the word "domain" which perhaps is not in everybody's active vocab.
Date/Time: Thu Sep 15 03:28:12 CDT 2022
Name: Rossen Mikhov [Ed Note:
Email to this person always fails, so they cannot be contacted; this applies to
all of their submissions below.]
Report Type: Error Report
Opt Subject: UTS #18: Unicode Regular Expressions
https://www.unicode.org/reports/tr18/#Subtraction_and_Intersection Version 23 Date 2022-02-08 Location: Section "1.3 Subtraction and Intersection", near the end of the section. Wrong text: Thus the following matches all code points that neither have a Script value of Greek nor are in Basic_Emoji: [^[\p{Script=Greek} && \p{Basic_Emoji}]] Possible correction: Thus the following matches all code points that do not simultaneously have a Script value of Greek and are in Basic_Emoji: Suggestion: There are no Greek emoji, so the example actually matches all Unicode code points. Perhaps a more illustrative example should be given.
Date/Time: Thu Sep 15 05:52:10 CDT 2022
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: Unicode Chapter 3 Conformance
https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf Version 15.0.0 Location: D62b Graphical Application Problematic text: A nonspacing mark in a defective combining character sequence is not part of a grapheme cluster and is subject to the same kinds of fallback processing as for any defective combining character sequence. Explanation: "Grapheme cluster" is defined in D60 as "The text between grapheme cluster boundaries". So, formally, any character is part of some grapheme cluster, be it a degenerate one. What is more troubling with this definition D62b is that it states that nonspacing marks apply to grapheme bases, with "Grapheme base" being defined in D58 as based on Grapheme_Base. But Grapheme_Base is no longer used by UAX29. It isn't clear if nonspacing marks should "graphically apply" to things other than Grapheme_Base characters and Korean syllables, for example what about emoji ZWJ sequences.
Date/Time: Fri Sep 16 09:45:20 CDT 2022
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #29: Unicode Text Segmentation
https://www.unicode.org/reports/tr29/#Table_Combining_Char_Sequences_and_Grapheme_Clusters Version: Unicode 15.0.0 Date: 2022-08-26 Revision: 41 Location: Table 1b. Combining Character Sequences and Grapheme Clusters Problematic text: legacy grapheme cluster: crlf | Control | legacy-core legacy-postcore* extended grapheme cluster: crlf | Control | precore* core postcore* Possible correction: legacy grapheme cluster: crlf | CR | LF | Control | legacy-core legacy-postcore* extended grapheme cluster: crlf | CR | LF | Control | precore* core postcore* Alternative possible correction: (In table 1c) crlf := CR LF | CR | LF Explanation: Looks like a simple editorial omission. With this minor correction, the regular expressions exactly correspond to the specification of the rules GB1-GB999.
Date/Time: Fri Sep 16 09:51:21 CDT 2022
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #29: Unicode Text Segmentation
https://www.unicode.org/reports/tr29/#Testing Version: Unicode 15.0.0 Date: 2022-08-26 Revision: 41 Location: 7 Testing Problematic text: Note: Testing two adjacent characters is insufficient for determining a boundary, except for the case of the default grapheme clusters. Possible correction: Note: Testing two adjacent characters is insufficient for determining a boundary. Explanation: Maybe the easiest counterexample is a sequence of many RI characters. There is no fixed limit to the number of preceding characters needed for context.
Date/Time: Wed Sep 21 02:47:38 CDT 2022
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
UAX #29: Unicode Text Segmentation https://www.unicode.org/reports/tr29/#Table_Combining_Char_Sequences_and_Grapheme_Clusters Version: Unicode 15.0.0 Date: 2022-08-26 Revision: 41 UAX #14: Unicode Line Breaking Algorithm https://www.unicode.org/reports/tr14/#Dictionary Version: Unicode 15.0.0 Date: 2022-08-16 Revision: 49 Location: 5.2 Dictionary Usage Problematic text: BBC English Dictionary: sIləbl where I is <U+026A, U+0332> and ə is U+0259. The vowel of the stressed syllable is underlined. Collins Cobuild English Language Dictionary: sIləbə°l where I is <U+026A, U+0332> and has the same meaning as in the BBC English Dictionary. The ə is U+0259 (both times). The ° is a U+2070 and indicates the schwa may be omitted. Explanation: The typeset examples do not correspond to the explanation text. Specifically, the examples have the final letter "l" underlined (with an HTML <u> tag, not with U+0332, so cannot reproduce here). But this is not the stressed vowel. This should not be underlined and instead the second letter "I" should be underlined. The typeset examples in this section also deviate from the explanations in other ways ("I" is not U+026A as stated, "°" is not U+2070 as stated, etc.) but those are visually similar and can be forgiven for lack of fonts or something in the document producing system.
Date/Time: Wed Sep 21 07:53:00 CDT 2022
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
https://www.unicode.org/reports/tr14/#Examples Version: Unicode 15.0.0 Date: 2022-08-16 Revision: 49 Location: 8.2 Examples of Customization, Example 7 Problematic text 1: The tailoring can be accomplished by first segmenting the text into grapheme clusters according to the rules defined in UAX #29, and then finding line breaks according to the default line break rules, giving each grapheme cluster the line breaking class of its first code point. Explanation: This tailoring wouldn't be conforming in edge cases. Suppose the text is <CR, LF, LF>. After applying UAX #29, this becomes two grapheme clusters <CR, LF> and <LF>, with first code points <CR> and <LF>, respectively. Then default line breaking rules would prevent a line break between these, contrary to the conformance requirement for a mandatory break. Problematic text 2: An example of a grapheme cluster that would be split by the default line break rules is a Zero Width Space followed by a combining mark. Explanation: According to the latest version of UAX #29, Zero Width Space followed by a combining mark does not form one grapheme cluster (ZWSP has Grapheme_Cluster_Break=Control).
Date/Time: Thu Sep 15 11:12:15 CDT 2022
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UTS #51: Unicode Emoji
https://www.unicode.org/reports/tr51/#gender-neutral Version: 15.0 Date: 2022-08-31 Revision: 23 Location: "2.3.1 Gender-Neutral Emoji", near the end of the section Wrong text: Gender-neutral versions of the profession or role emoji using object format type ZWJ sequences are promulgated by adding them to the *RGI emoji tag sequence set*. Possible correction: Gender-neutral versions of the profession or role emoji using object format type ZWJ sequences are promulgated by adding them to the *RGI emoji ZWJ sequence set*.
Date/Time: Tue Jul 19 14:30:32 CDT 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTR #54
UTR #54 contains the mistake “a one of several” (instead of just “one of several”) and a needless comma here: “Separation of the glyph variant information and documentation of all the associated contextual rules and their interaction with the Mongolian text model, from the production of versioned code charts would also make it possible to update this information much more quickly.”
Date/Time: Sun Aug 7 06:29:40 CDT 2022
Name: Yasuhiro Inukai
Report Type: Error Report
Opt Subject: Unicode Standard Version 14.0 Core Specification
There is an error in Figure 13-7 on p.556 of Unicode Standard Version 14.0 Core Specification (https://www.unicode.org/versions/Unicode14.0.0/UnicodeStandard-14.0.pdf). Under “cherig”, the example glyph just to the right of “1821” is not correct. It shows U+1822 (MONGOLIAN LETTER I)-like glyph instead of U+1821 (MONGOLIAN LETTER E). Thanks,
Date/Time: Wed Sep 14 09:09:43 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Typo in chapter 9
Chapter 9 includes the word “AARABIC” in the “High Hamza” section.
Date/Time: Sun Sep 18 12:29:03 CDT 2022
Name: Mark Longley
Report Type: Error Report
Opt Subject: Unicode Standard Version 15.0 Core Specification
In the Unicode Standard - Version 15.0 - Core Specification in section 23.9 Tag Characters on page 945 there is a minuscule error in the second subsection Deprecated Use for Language Tagging. It is stated that "In Version 8.0, all but the language tag identification character were un-deprecated" whereas in fact U+E007F CANCEL TAG was still deprecated in Version 8.0 and was not un-deprecated until Version 9.0.
Date/Time: Fri Sep 23 22:38:51 CDT 2022
Name: Pablo Sebastián Viola
Report Type: Error Report
Opt Subject: UnicodeStandard-15.0.pdf
I am reading the file stored in https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf. In page xxiii I see that the Unicode version 15.0 is referred, what confirms that I am reading the right file. However, in many places, the document refers to itself as the version 14.0. I found mentions to the version 14.0 that probably are wrong, in pages: 75, 76, 77, 83. In the Index there are entries referred to Version 14.0 that probably should be 15.0: "Characters, .... number encoded in version 14.0.... p.3", "Version 14.0..... p.77". There are other places where the version 14.0 is mentioned, but they are probably right.
Date/Time: Tue Oct 11 15:20:10 CDT 2022
Name: Markus Scherer
Report Type: Error Report
Opt Subject: core spec 2.9 Details of Allocation vs. plane 3
Figure 2-13 Unicode Allocation ( https://www.unicode.org/versions/Unicode15.0.0/ch02.pdf#G286741 page 47) still shows the U+3xxxx plane as Reserved. We have had CJK characters there since Unicode 13. This plane should be shaded for Graphic characters. The text on page 51 about "Plane 3 (TIP)" might be fine, but I suspect that its statement that it "is dedicated to encoding additional unified CJK characters" also predates Unicode 13. At a minimum, we should add a comma after the "(TIP)" in the paragraph, but it probably wants to read more like the text for plane 2.
Date/Time: Thu Oct 13 07:15:03 CDT 2022
Name: Mark Longley
Report Type: Error Report
Opt Subject: The Unicode® Standard Version 15.0 – Core Specification
There is a typographical error in Chapter 22 Symbols in section 22.10 Enclosed and Square in subsection Enclosed Alphanumeric Supplement: U+1F100–U+1F1FF in subsubsection Creative Commons License Symbols on page 910. The first of the two character code ranges is given as “U+1F10D..U+1F10FF” when the end of this range should in fact be “U+1F10F”, i.e. there is a spurious duplicated terminal ‘F’ hexadecimal digit.
(None at this time.)