The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of September 25, 2021, since the previous cumulative document was issued prior to UTC #168 (July 20, 2021).
The links below go directly to open PRIs and to feedback documents for them, as of *INSERT-DATE*HERE*, 2021.
Issue Name Feedback Link 427 Proposed Update UTS #18, Unicode Regular Expressions (feedback) UTC 426 Proposed Update UTR #53, Unicode Arabic Mark Rendering (feedback) No new feedback at this time UTC
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to Properties & Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
Date/Time: Tue Jul 20 00:42:55 CDT 2021
Name: Jerome Alan Rossignuolo
Report Type: Other Question, Problem, or Feedback
Opt Subject: Missing Fundamental Chinese BuShou Radicals in UNICODE
Hello, I am doing research into a learning tool for Simplified Chinese. I find it exceedingly curious that of the 280 Primary and Associated Indexing Components (部首) as standardized by GB13000.1 Chinese Character Component Standard (see http://ling.whu.edu.cn/law/002/2016-04-20/1307.html) and used by the XinHuaZiDian (新华字典)the most widely used dictionary in China, four are missing from UNICODE! That is, there are 276 of these encoded in UNICODE and four have no UNICODE encoding! (A little note, the term Indexing Components and Radicals are commonly imprecisely interchanged. Most people, even fluent Chinese speakers will incorrectly refer to what are actually Indexing Components as Radicals.) Moreover, it is obviously not just me faced with the difficulties of not being able to type these Indexing Components. I have scoured the Internet looking for information on them and cannot find any reference to them in the 100s of documents that do reference the 276 others. These four are simply left out of virtually all material. Although you can find them in a scant few places as images. They are in the PDF files of GB13000.1 and are in the printed XinHuaZiDian dictionary along with they are in the official XinHuaZiDian mobile app. These Associated Indexing Components are still in use. Why are they missing from UNICODE? The Indexing Components are the fundamental building blocks of Chinese. In a way, they are somewhat analogous to our alphabet. It is like missing the letter little q from UNICODE. Although the difference is that one can still type a Chinese character in UNICODE that includes the missing Indexing Component. Still, it is exceeding odd these cannot be typed since they need to be typed in any article referencing the Indexing Components in the Chinese language used by billions of people. The lack of them on the Internet is testament this is clearly causing people trouble. Since I cannot type them, I will refer you to reference the missing Indexing Components in the XinHuaZiDian. They are [50], [55], [68], and [145]. These can also be found in GB13000.1. If you need additional clarification or information, please contact me at [redacted]. I can send you a photo of the missing Indexing Components. Sincerely, Jerome Rossignuolo
Date/Time: Mon Aug 2 17:35:50 CDT 2021
Name: William He
Report Type: Error Report
Opt Subject: Error in definition of U+8561 蕡
Hello, The definition of U+8561 蕡 is listed as "hemp seeds; plant with abundant". Something seems to be missing from this definition. Perhaps it means to say "plant with abundant fruit" or something to that effect. Thanks, William
Date/Time: Thu Aug 12 09:45:13 CDT 2021
Name: Ken Lunde
Report Type: Error Report
Opt Subject: kSpoofingVariant errors?
The latest version of the JIS X 0208 and JIS X 0213 standards explicitly state that U+53F1 叱 is a variant of U+20B9F 𠮟, and can be used in implementations that support only the former standard. This is reflected in their kJoyoKanji property values: U+53F1 kJoyoKanji U+20B9F U+20B9F kJoyoKanji 2010 The following are the kSpoofingVariant property values for these two ideographs: U+53F1 kSpoofingVariant U+20B9F U+20B9F kSpoofingVariant U+53F1 Based on their treatment in Jōyō Kanji and the JIS standards as explicit variants, perhaps they should instead have kZVariant property values: U+53F1 kZVariant U+20B9F U+20B9F kZVariant U+53F1 Of course, this is not urgent, and should be considered for Unicode Version 15.0. Regards... -- Ken
Date/Time: Mon Sep 20 23:43:06 CDT 2021
Name: Eiso Chan
Report Type: Error Report
Opt Subject: 4 missing UK glyphs in Unicode, 14.0.0
The following UK glyphs are missing in Unicode, 14.0.0. UK-02830 for U+238A7 UK-02849 for U+2A909 UK-01320 for U+2B92E UK-01422 for U+2E66E
Date/Time: Thu Sep 23 15:18:26 CDT 2021
Name: Jaemin Chung
Report Type: Website Problem
Opt Subject: Unihan Database contents search feature update suggestion
I suggest that the Unihan Database contents search feature be updated. http://unicode.org/charts/unihansearch.html 1. It should not be limited to the definition, Cantonese, Mandarin, Tang, Japanese on/kun, and Korean (Yale). 2. There should be something like "match whole words only" feature. Someone searching for the reading "han" may not want "chang". 3. For Mandarin, tone numbers no longer work because the Unihan DB now uses tone marks. So the "jing3" example on the main search page should be changed.
Date/Time: Thu Sep 23 23:54:38 CDT 2021
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: U+6F55 and U+23C98
The following kSimplifiedVariant and kTraditionalVariant values should be added to the Unihan Database. U+6F55 kSimplifiedVariant U+23C98 U+23C98 kTraditionalVariant U+6F55 (U+6F55 is 潕 and U+23C98 is 𣲘)
Date/Time: Fri Sep 24 00:26:38 CDT 2021
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: U+44E8 & U+7F43 and U+6C84 & U+6F90
Here are additional kSimplifiedVariant and kTraditionalVariant values that should be added to the Unihan Database. U+44E8 kTraditionalVariant U+7F43 U+7F43 kSimplifiedVariant U+44E8 (U+44E8 is 䓨 and U+7F43 is 罃) U+6C84 kTraditionalVariant U+6F90 U+6F90 kSimplifiedVariant U+6C84 (U+6C84 is 沄 and U+6F90 is 澐)
Date/Time: Mon Aug 2 10:58:51 CDT 2021
Name: Rod Lockwood
Report Type: Other Question, Problem, or Feedback
Opt Subject: Superscripted Ordinal Suffixes
Because you did not make a complete superscript set of the Latin alphabet, there is no way to create the superscripted ordinal suffixes st, nd, rd, or th without changing the font.
Date/Time: Wed Sep 8 04:10:47 CDT 2021
Name: Brian Sullender
Report Type: Error Report
Opt Subject: Different glyph with the same combination of Code Points
Under code charts in the document "Arabic Presentation Forms-A" i have found what appears to be ether a typo or error in the specification. The presentation Code Points FC03 and FBF9 are different glyph's with the same Code Point combination and both are of the "isolated" form. I don't know anything about these languages, but this looks wrong. Found the problem when running an algorithm to import the Presentation Code Points from the documents into a lookup table.
Date/Time: Thu Sep 9 02:50:37 CDT 2021
Name: Brian Sullender
Report Type: Error Report
Opt Subject: Different glyphs with the same Code Points
I recently reported an error in the document "Arabic Presentation Forms-A" about 2 conflicting presentation code points. I wanted to inform you there was 2 other code points that conflict with each other. They are FBFA and FC68, both have the same combination code points of 0626 and 0649, and both are "final" forms presentation code points. I haven't found any others.
Date/Time: Fri Aug 6 16:34:05 CDT 2021
Name: Peter Constable
Report Type: Other Question, Problem, or Feedback
Opt Subject: UTS #39 data file default property values
UTC #168 discussed enhancements to use of @missing lines to indicate default property values. Coincidentally, I notice that the Identifier_Type and Identifier_Status data files for UTS #39 do not use the @missing convention to indicate default values at all. Rather, each has a prose statement (not machine readable) describing default values. Moreover, each has two separate statements. If UTC is going to be enhancing mechanisms for machine-readable default property values, it should consider incorporating the same mechanisms into all data files where relevant.
Date/Time: Mon Aug 30 03:46:13 CDT 2021
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: ToASCII does not account for trailing dots
If you invoke https://www.unicode.org/reports/tr46/#ToASCII with VerifyDnsLength set to true it seems you cannot pass a domain such as `example.org.` (note the trailing dot) even though that is a valid domain. Credit: Gijs Kruitbosch.
Date/Time: Thu Sep 9 06:59:02 CDT 2021
Name: Mickey Rose
Report Type: Error Report
Opt Subject: incorrect grammar in UTS #18: Character Classes with Strings
The auxiliary grammar presented in 2.2.1 Character Classes with Strings (https://unicode.org/reports/tr18/#Character_Ranges_with_Strings) doesn't generate the examples given further. Here are some of the examples (within character class): [a-z\q{x\u{323}}] [a-z ñ \q{ch} \q{ll} \q{rr}] And here is the grammar: ITEM := "\q{" (CODE_POINT (SP CODE_POINT)*)? "}" SP := \u{20} The grammar suggests that a single SP is required between individual CODE_POINTs. Which if true would be confusing, for example [\q{c h}]. Besides, this ITEM production is supposed to be embedded in CHARACTER_CLASS grammar (https://unicode.org/reports/tr18/#character_ranges) which already allows and ignores whitespace: >> Whitespace is allowed between any elements, but to simplify the presentation the many occurrences of sequences of spaces (" "*) are omitted. So I believe what was actually intended is this: ITEM := "\q{" CODE_POINT2* "}" (with whitespace allowed by virtue of being embedded within CHARACTER_CLASS grammar) In this scenario [\q{aa ch}] is equivalent to [\q{aach}]. Alternatively, if SP is intended to separate whole strings inside \q{}, then you need to allow multiple CODE_POINTs without SP between them: ITEM := "\q{" CODE_POINT2* (SP CODE_POINT2+)* "}" In this scenario [\q{aa ch}] is equivalent to [\q{aa}\q{ch}]. But then it would be very confusing that only \u{20} would act as separator, while other whitespace like \u{09} wouldn't. In either case, some examples with spaces inside \q{...} should be given for clarification.
(None at this time.)
Date/Time: Fri Jul 23 03:04:50 CDT 2021
Name: Liang Hai
Report Type: Error Report
Opt Subject: Obscure statement in section 12.1, Devanagari: Rendering Devanagari
R10 (rule 10) in the subsection “Rendering Devanagari” of the Core Spec’s section 12.1, Devanagari: > Other modifying marks, in particular bindus and svaras, … The relative placement > of these marks is horizontal rather than vertical; the horizontal rendering order may > vary according to typographic concerns. Unclear what “relative placement of these marks is horizontal” and “horizontal rendering order may vary” means.
Date/Time: Fri Aug 6 17:02:10 CDT 2021
Name: Peter Constable
Report Type: Other Question, Problem, or Feedback
Opt Subject: bad links in UTS #46
Note: This has already been fixed in Unicode 14.0, for UTS #46 and UTS #39.
In the references section of UTS #46, several of the links to IETF documents are bad: some are simply links to anchors within UTS #46 itself (eg., the references for IDNA 2003); and some external links are broken (unstable ietf.org URLs?; e.g., links for RFCs 5890, 5891, 5893, 5894). The following is an example URL that works (for RFC 5890): https://www.rfc-editor.org/info/rfc5890.
Date/Time: Mon Aug 23 09:11:43 CDT 2021
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #433: Typos in NamesList
Let me point out the following typos in NamesList-14.0.0d11.txt: 002F SOLIDUS = slash,forward slash, virgule ## space missing after the comma 133CC EGYPTIAN HIEROGLYPH W024 * phonogramm 'nw' ## the m is doubled 133CD EGYPTIAN HIEROGLYPH W024A * monogramm 'nw(n)' or 'nww' ## the m is doubled 133E4 EGYPTIAN HIEROGLYPH Z001 * semogram index * classifier 'single' * not to be confuse with 133FA ## should read 'confused'
Date/Time: Thu Aug 26 10:27:21 CDT 2021
Name: r12a
Report Type: Error Report
Opt Subject: Phonetic typo in Arabic section
On page 394 of v14 the long Kurdish u is described in phonetic notation as u: ie. u U+0075: LATIN SMALL LETTER U : U+003A: COLON whereas it should be uː ie. u U+0075: LATIN SMALL LETTER U ː U+02D0: MODIFIER LETTER TRIANGULAR COLON
Note: The Editorial Committee has already reviewed feedback above this line, as of 2021/09/02.
Date/Time: Wed Sep 15 08:12:31 CDT 2021
Name: Angus Patrick
Report Type: Error Report
Opt Subject: Moon phases Naming
Dear Unicode Emojis, I have put the type of message as "Error Report". I have done this because I believe the moon phases emojis (I'm looking at Nos. 947-954) are erroneously named. They are labelled waning crescent, waxing gibbous etc when these are not true descriptions in large parts of the world. For example, the one labelled "waning crescent moon" looks like a waxing crescent moon when looked at from the Southern Hemisphere . This makes even less sense to people who live close to the equator: in tropical zones this same crescent appears to lie on its side. You might say that only a small proportion of the world's population lives in the Southern Hemisphere but I think it would be unfair, and maybe discriminatory to ignore their point of view. I suggest that these emojis be renamed to more generic names to avoid being offensive. Sincerely Angus Patrick
Date/Time: Wed Sep 15 10:03:17 CDT 2021
Name: Giacomo Catenazzi
Report Type: Error Report
Opt Subject: Missing number in codepoint in Kana Extended-B
Page 758 of the Unicode Standard 14.0.0, the sub-chapter title states "Kana Extended-B: U+AFF0-U+1AFFF", but it should be "Kana Extended-B: U+1AFF0-U+1AFFF". Note: this is an addition (new text) of Unicode 14.0.0
Date/Time: Thu Aug 12 12:26:46 CDT 2021
Name: Assam Association Delhi
Report Type: Error Report
Opt Subject: “Bengali and Assamese” script
Note: This item was directed to the Unicode Consortium staff and has been responded to by the executive director.
The Unicode Consortium P.O. Box 391476 Mountain View, CA 94039-1476 U.S.A. +1-408-401-8915 Dear Sir, Though Unicode is a game changer in today’s cyberspace enabling thousands of language to interact electronically and in cyberspace, but a few language viz. Assamese (India) may still face some issues which may kindly be addressed s viz. a. In the Code Charts http://www.unicode.org/charts/, “Bengali and Assamese” script is appeared in the home-page as one of South Asian Scripts but in the linked page i.e. https://www.unicode.org/charts/PDF/U0980.pdf it is appearing only as “Bengali” (instead of “Bengali and Assamese”). So It is requested to update it with “Bengali and Assamese” instead of ‘Bengali” alone. b. It is felt that merging Assamese script with Bengali in the same Code chart may create problem in future viz. i. As Assamese scripts are not placed in order in the above Code chart, sorting of Assamese words through SWs would be difficult due to their disruptive positions. ii. Transliteration/ translation may become difficult due to lack of separate identity of Assamese script. iii. Assamese language may face incompatibility issue in AI , Robotics etc for the above two reasons . If the above fear is true, then it is requested to take appropriate action for the safety of Assamese language . Meanwhile, we are expressing our keenness to work jointly with you to resolve such issues, if any. Your sincerely Dibyojit Dutta General Secretary, Assam Association Delhi Copy to a. Secretary Ministry of Electronics & IT, Govt of India, New Delhi