The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 4, 2023, since the previous cumulative document was issued prior to UTC #175 (April 2023).
The links below go directly to open PRIs and to feedback documents for them, as of July 4, 2023.
The links below go to locations in this document for feedback.
Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports
Date/Time: Mon May 22 18:07:09 CDT 2023
ReportID: ID20230522180709
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+645E
This character appears on p.350 of Casey but is not in a phonetic group. It appears that the appropriate one is 842. This was not added in version 15. Thank you.
Date/Time: Mon May 22 18:07:42 CDT 2023
ReportID: ID20230522180742
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+773E
This character is a variant of U+8846. It appears in Casey in the same group 324. Please add this entry to you database. Thank you.
Date/Time: Mon May 22 18:08:30 CDT 2023
ReportID: ID20230522180830
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+78D7
This character had a kPhonetic value of 269 in version 13, which was changed in version 14 to 1157*. It disappered from the database in version 15 when the latter group was radically pruned, as needed to occur. Please add the correct entry to the database. Thank you.
Date/Time: Mon Apr 17 17:12:14 CDT 2023
ReportID: ID20230417171214
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Feedback on
L2/23-102
On page 3, the glyph for LATIN SMALL LETTER R WITH LEFT TIE in the code chart is a ligature of U+0279 LATIN SMALL LETTER TURNED R and U+0072 LATIN SMALL LETTER R. However, that does not match any of the attestations of this character in any of the figures in this proposal. Instead, they all consistently make it look like U+0072 LATIN SMALL LETTER R with a preceding diagonal stroke. The Unicode code chart glyph should match the attested glyphs.
Date/Time: Thu May 18 11:42:42 CDT 2023
ReportID: ID20230518114242
Name: Charlotte Buff
Report Type: Other Document Submission
Opt Subject: On the name of KHITAN SMALL SCRIPT CHARACTER-18CFF
The proposed character U+18CFF KHITAN SMALL SCRIPT CHARACTER-18CFF (cf. L2/23-065) which was recently accepted for a future version of the standard is not a normal character of the Khitan small script, but instead acts as a placeholder for characters that have been lost or are illegible. I propose changing its name to KHITAN SMALL SCRIPT LOST SIGN to reflect that special purpose. Unlike Han or Tangut ideographs, the names of the characters in the Khitan Small Script block are all explicitly defined in UnicodeData.txt, so I do not think it is strictly necessary for U+18CFF to also follow the same algorithmic naming scheme – unless of course some internal tool I am unaware of requires it, in which case this proposal can be discarded.
Date/Time: Mon Jun 26 12:48:27 CDT 2023
ReportID: ID20230626124827
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Name of U+1CE07
U+1CE07 TOP RIGHT BLACK LEFT-POINTING SMALL TRIANGLE (approved for Unicode 16.0) has a glyph in the top left of the cell, according to L2/21-235R. Shouldn’t it be named TOP LEFT BLACK LEFT-POINTING SMALL TRIANGLE?
Date/Time: Mon Jun 26 21:54:09 CDT 2023
ReportID: ID20230626215409
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Feedback on
L2/23-147
U+1E6FE TAI YO SYMBOL MEUANG is represents “mương”, according to L2/22-289. The nucleus “ươ” /ɨə/ is ASCIIfied “UEA” in U+1E6EA TAI YO LETTER UEA. Therefore, U+1E6FE should be named “TAI YO SYMBOL MUEANG”.
Date/Time: Mon Jul 03 12:42:20 CDT 2023
ReportID: ID20230703124220
Name: Little Miss MOSFET
Report Type: Error Report
Opt Subject: Duployan Bloc Errors at U1BC00.pdf
Dear Unicode Consortium, For years, evidently with no report, the Duployan code block has been SNAFU. I’m merely a user for many years of Duployan to write Chinukwawa, not a profound techie, so please forgive any issues with my submittal, but I’d like to press on these errors. https://www.unicode.org/charts/PDF/U1BC00.pdf Lists the Unicode Duployan characters as currently drafted in the standard. Each character which contains a little arrow is incorrect. *There are no little arrows in Duployan.* These we’re evidently included by mistake, as the proposal to include this block described the characters’ kerning direction using these little arrows. *These were obviously not intended to be part of the standard.* The little arrows describe the characters as they link together and direction of writing in the unicode inclusion proposal. *They are not, nor ever have been, a part of these characters.* These little arrows should be deleted. Furthermore: Duployan script works sort of like Arabic when written. It has a complex kerning which moves left to right and top to bottom. There is as yet no functional font for Duployan, and apparently no description of how these characters link up in Unicode, though this was described in the proposal. *That is to say, the state of Duployan as specified by Unicode is incomplete and unusable.* I’d very much like to see this resolved eventually. If you need any additional info, please contact me as above. Thanks, Little Miss MOSFET
Date/Time: Mon Jul 03 12:52:12 CDT 2023
ReportID: ID20230703125212
Name: Little Miss MOSFET
Report Type: Error Report
Opt Subject: PS on Duployan Bloc Errors - Inclusion Proposal Document
https://www.unicode.org/L2/L2010/10272r-duployan.pdf This is Van Anderson’s proposal. From the textual examples, you can see that the arrows were not meant to be part of the standard, as they are used by the author to describe the direction of a character’s writing and rotation for linkage. None of the primary sources use these little arrows. They are not part of Duployan, but an erroneous Unicode Consortium artifact. But because of their inclusion, fonts which include Duployan usually copy these little arrows.
Date/Time: Tue May 02 07:23:16 CDT 2023
ReportID: ID20230502072316
Name: Charlotte Buff
Report Type: Other Document Submission
Opt Subject: Text segmentation properties of Kirat Rai vowel signs
The vowel signs of the Kirat Rai script, which has been accepted for a future version of the Unicode Standard based on proposal document L2/22-043R, are slated to be implemented as spacing, stand-alone characters (gc=Lo) rather than as combining or spacing marks. While not explicitly stated, this would likely result in them being assigned the Grapheme_Cluster_Break property value Other (GCB=XX). Three of these vowel signs – AI, O, and AU – are visually sequences of other vowel signs and have therefore been given canonical decomposition mappings: U+16D68 ≡ <U+16D67, U+16D67> AI ≡ <E, E> U+16D69 ≡ <U+16D63, U+16D67> O ≡ <AA, E> U+16D6A ≡ <U+16D69, U+16D67> AU ≡ <O, E> These properties, however, do not maintain canonical equivalence. The vowel signs in question would be one grapheme cluster each in NFC, but two grapheme clusters each in NFD. This is forbidden by UAX #29, which states in section 2, “Conformance”: »A boundary exists in text not normalized in form NFD if and only if it would occur at the corresponding position in NFD text.« There are several possible approaches for resolving this issue: 1) Reclassify Kirat Rai vowel signs as spacing, combining marks A minimal solution that preserves canonical equivalence for both legacy and extended grapheme clusters would involve U+16D67 KIRAT RAI VOWEL SIGN E and U+16D68 KIRAT RAI VOWEL SIGN AI being changed to Grapheme_Cluster_Break=Extend (GCB=EX). Though not strictly necessary, it would then also make sense to change their General_Category value to Spacing_Mark (gc=Mc). This approach may not be desirable because it would prevent vowel signs E and AI from being used in isolation; they would always forcibly “glue” themselves to the preceding character such as a space or a punctuation mark and potentially cause problems for the text renderer. The stand-alone nature of the Kirat Rai vowel signs was quite a deliberate choice because of the similarities to the New Tai Lue script. 2) Invent new GCB rules for these vowel signs The text segmentation algorithm would need to be amended to make Kirat Rai vowel signs similar in nature to Hangul Jamo – forming grapheme clusters with each other in certain configurations, but not with unrelated characters. For minimal impact, the new rule should be limited to the interaction between vowel signs E, AA, and O followed directly by vowel sign E, which covers all three decomposition mappings. It could look something like this: [\u{16d63}\u{16d67}\u{16d69}] × \u{16d67} Note that U+16D67 occurs on both sides of the rule because it is both the leading and the trailing codepoint in the decomposition mapping of U+16D68. This approach is probably a cleaner solution because it gets rid of the problem without changing anything about the general nature of the script, but it also introduces a unique edge case into an otherwise quite straightforward algorithm for the sake of just a handful of characters. 3) Change decompositions from canonical to compatibility There is no requirement for compatibility decompositions to preserve the text segmentation boundaries of their source strings. In practice, users of the script would always encounter the vowel signs in precomposed form because NFKC and NFKD are generally not used on the front end, while search and collation algorithms would still be able to recognise the weak equivalence. However, it is questionable whether using mere compatibility equivalence for sequences that are truly identical in every sense is appropriate, especially in the context of security. 4) Do not encode compound vowel signs as separate characters The characters U+16D68..U+16D6A would be removed from the Kirat Rai repertoire altogether and the only way to represent vowel signs AI, O, and AU would be through the use of sequences. Perhaps named character sequences could be defined as well if deemed useful. This approach would circumvent the entire issue without side effects, but is also clearly the least desirable for actual users of the script who consider these vowel signs to be linguistic units regardless of their glyphic appearance. I do not think this would be an acceptable solution in practice. 5) Encode the vowel signs as atomic characters without decomposition mappings This approach is the worst one in my view as it would necessitate the creation of dreaded Do Not Use tables for the Kirat Rai script, which goes against everyone’s interests. I strongly recommend against this solution.
Date/Time: Tue Jun 13 08:12:45 CDT 2023
ReportID: ID20230613081245
Name: Jae Woong Lee
Report Type: Error Report
Opt Subject:
Hello, I am using unicode 9.0 with mysql 8.0 database. collation name: utf8mb4_0900_ai_ci I can't get the desired result when I compare the Korean string using unicode 9.0. unicode 9.0 considers separated characters and combined characters as the same thing. ex) - 요 = 요 -> result True : correct - 요 = ㅇㅛ -> result True : This is an invalid result. But if I use other collations, utf8mb4_general_ci, utf8mb4_unicode_ci, I get the correct result. ex) - 요 = 요 -> result True : correct - 요 = ㅇㅛ -> result False : corrent It seems that the Korean comparison method is different from 9.0. I'm wondering why characters that look different to Koreans are called the same in unicode 9.0. Is this by design or is it a bug and can it be fixed? I contacted mysql, but they told me that it's not a mysql issue, but to contact the unicode association because they used unicode 9.0 as it is. ----------------------------------- [9 Jun 16:10] MySQL Verification Team Hi, You can observe that collating the constants changing the result. You can try different COLLATE expressions. Regarding Koreans language, we are not experts on this. We just implemented the UTF standard, to the last point. Hence, you should contact the people that define Unicode standards. Also, do not forget that two strings with different grapheme clusters can be considered identical, as per standard. There are many examples in the textbooks on this subject. Not a bug. ----------------------------------- Regards, Jae.
(None at this time.)
Date/Time: Thu Jun 01 05:25:23 CDT 2023
ReportID: ID20230601052523
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Core Specification
The Lao chapter in the Core Spec is missing any information on spacing. I believe at minimum we need to copy some of the information from the Thai section or refer to the Thai section about spacing. This came to light because of a comment made by Norbert Lindenberg that suggested to me U+200B is also used in Lao. But there is no such reference in the Core Spec.
(None at this time.)