The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 8, 2021, since the previous cumulative document was issued prior to UTC #165 (October 2020).
The links below go directly to open PRIs and to feedback documents for them, as of January 7, 2021.
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to Properties & Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
Date/Time: Thu Jan 7 15:18:08 CST 2021
Name: William T. Nelson
Report Type: Error Report
Opt Subject: UAX38 kMandarin readings for two ideographs
U+7B7D 筽 has kMandarin value o which is its Korean reading. Please change the value to wú as per 两万汉字中日韩越英俄读音释义字典 page 846 entry 17174. U+9730 霰 has kMandarin value sǎn, but the correct reading is xiàn according to character dictionaries. (The PRI #297 feedback page has an error report from Markus Scherer regarding this value.)
Date/Time: Wed Sep 30 10:29:11 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/20-249
The figures from Safi-Nezhad 2008 have examples of stacked numbers with different currency denominations. For example, the first row of figure 9 shows “1 toman” above “1,111 dinar”. How should these be encoded? How should “1,000” in figure 54 be encoded? Some numbers in figures 53 and 54 end with what looks like PERSIAN SIYAQ KHARVAR MARK with its components swapped. How should that be encoded? Some of the kharvar amounts in figures 44 and thereafter end with what looks like just the dot of PERSIAN SIYAQ KHARVAR MARK. How should that be encoded? Figure 56 shows Indic siyaq instead of Persian siyaq.
Date/Time: Wed Sep 30 19:49:25 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comment on L2/20-247
A third option is to adopt a new rule that a composed code point (e.g. U+09CB BENGALI VOWEL SIGN O) may be the base of a variation sequence if and only if its decomposed trailing code point (e.g. U+09BE BENGALI VOWEL SIGN AA) is also the base of a variation sequence, and those two variation sequences are harmonized to represent essentially the same variation.
Date/Time: Thu Oct 1 21:02:53 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/20-187R and L2/20-188R
In the Elbasan block, the Albanian letters “e” and “ë” are ASCIIfied as EI and E. The Vithkuqi and Todhri proposals ASCIIfy them as E and EH. Is this inconsistency intentional?
Date/Time: Fri Oct 2 20:49:07 CDT 2020
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Kyrgyz som observation
I had submitted a proposal to include some currency symbols (https://www.unicode.org/L2/L2019/19291-missing-currency.pdf) The proposal was not accepted because I did not provide sufficient evidence of the use of the symbols. Since then, the national bank of Kyrgyzstan has submitted his own proposal with the sufficient evidence (https://www.unicode.org/L2/L2020/20261-kyrgyz-som.pdf). As expected I wasn't mentioned in the proposal, but it was expected since I can expect the national bank to read some year old proposals. I write to reinstate my opinion, that the name "KYRGYZ SOM SIGN" is not preferable, since there is a possibility that Uzbekistan would also adopt it, given the greater friendship between countries and identical names for their currencies. The more generic name "SOM SIGN" would fit the current pattern of recently added currency signs and would ensure that Uzbekitanis would feel welcome to adopt the sign if they so choose. I do not have any allegiance to either country (I live in Mexico), I just think that it's a wise choice to think on the long term, given the fact that character names cannot be changed. Of course this is only a suggestion, as the National Bank clearly has the higher authority on this, and may dismiss me outright.
Date/Time: Mon Oct 5 12:49:03 CDT 2020
Name: Fawaz Ahmed
Report Type: Other Question, Problem, or Feedback
Opt Subject: 06E0 Character has wrong description
The Unicode document at https://www.unicode.org/charts/PDF/U0600.pdf , seems to described 06E0 as rectangular zero, but it should be described as ' circular zero. Thanks
Date/Time: Mon Oct 12 13:50:38 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/19-306
The proposed ARABIC LETTER THIN YEH has its own joining group, THIN YEH. What does the final thin yeh look like? Does it have two dots or zero? The normal yeh shown in the King Fahd Warsh examples of this proposal has zero dots in its final form, meaning U+06CC ARABIC LETTER FARSI YEH is the normal, non-thin yeh in this context. It would therefore make sense if the thin yeh also had zero dots in its final form, in which case it should be called ARABIC LETTER THIN FARSI YEH. Maybe it has no attested final form, but fonts are still going to have handle that case, so Unicode should provide some guidance. If ARABIC BASELINE ROUND DOT has gc=Lo why does ARABIC RAISED ROUND DOT have gc=Sk? Surely if one is a letter so is the other; cf. U+0674 ARABIC LETTER HIGH HAMZA. https://app.quranflash.com/book/Warsh2?en#/reader/chapter/565 (for example) in the right margin shows ARABIC LARGE ROUND DOT ABOVE behaving like ARABIC HAMZA ABOVE. The hamza is in UTR #53’s Modifier Combining Marks set; so should the dot be. This most likely goes for ARABIC LARGE ROUND DOT BELOW too.
Date/Time: Thu Nov 26 10:22:07 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Diacritics in Old Hungarian
The ad hoc report L2/11-242R recommends using U+1DC4 COMBINING MACRON-ACUTE for a certain mark attested in Old Hungarian. Neither the core standard nor the code chart for Combining Diacritical Marks Supplement mentions this usage. Should U+1DC4 be used in Old Hungarian? If not, how should it be represented? Other diacritics have seen some use as part of the modern revival of Old Hungarian, though not part of the most common version in use today. These include what appear to be U+0304 COMBINING MACRON, U+0307 COMBINING DOT ABOVE, and U+0301 COMBINING ACUTE ACCENT. With what code points should these diacritics be represented?
Date/Time: Thu Nov 26 10:40:35 CST 2020
Name: David Corbett
Report Type: Error Report
Opt Subject: Isolated U+08AC ARABIC LETTER ROHINGYA YEH
The core standard says that U+08AC ARABIC LETTER ROHINGYA YEH has no isolated form, but it may exist. See <https://github.com/googlei18n/noto-fonts/issues/1266>.
Date/Time: Fri Dec 4 21:19:30 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Rendering U+1D9FF SIGNWRITING HEAD with forehead marks
Some SignWriting marks are placed on the forehead. To avoid overlapping the top part of U+1D9FF SIGNWRITING HEAD, the top part of U+1D9FF is omitted. This is encoded as <U+1D9FF, U+1DA9B> (SignWriting ID 04-01-001-01-02-01). The corresponding symbols in ISWA 2010 (http://www.signbank.org/iswa/) are drawn with the top part of the head omitted; an example is 04-01-003-01-04-03. Should 04-01-003-01-04-03 be encoded as <U+1D9FF, U+1DA01, U+1DA9D, U+1DAA2> or as <U+1D9FF, U+1DA9B, U+1DA01, U+1DA9D, U+1DAA2>? The recently released Noto Sans SignWriting renders both with the top of the head omitted, but it might be preferable to be more explicit and render the two sequences distinctly.
Date/Time: Mon Dec 7 19:21:59 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Rendering SignWriting symbols without valid fill-1
According to chapter 21, in the section for SignWriting, “There are no explicit modifiers encoded for fill-1 or rotation-1, as those values are considered inherent in the base character”. However, there are some characters for which fill-1 is not valid, such as U+1D8F6 SIGNWRITING HAND-FIST THUMB HEEL. How should such characters be rendered when not followed by a valid fill modifier?
From: PANDI ID Registry
Sent: Monday, December 14, 2020, 12:22:02 AM PST
Subject: Re: PANDI Inquiries
We would like to know how can we increase the status of the Javanese script from limited use to recommended? should we send more evidence that the script are still actively being used by the community? because we needed it As soon as possible for our IDN process to ICANN. Thank you so much. waiting for your reply Best regards, Alicia Nabilla Business Development PANDI .id Registry
Date/Time: Sun Dec 20 09:57:01 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Mandaic kad
Chapter 9 says “There are two ways to represent kad in Mandaic: U+0857 MANDAIC LETTER KAD or the sequence <U+084A MANDAIC LETTER AK, U+0856 MANDAIC LETTER DUSHENNA>.” Do these two ways mean the same thing? Are they rendered identically? If they are identical, which one should people use?
Date/Time: Wed Sep 23 12:23:38 CDT 2020
Name: Wes
Report Type: Error Report
Opt Subject: Unicode confusables data missing Sharp-S letter to Capital-B confusable
Hi, I hope you're all doing well. It seems the confusables at ftp://ftp.unicode.org/Public/security/latest/confusables.txt failed to include the German sharp-S or Eszett letter ( https://en.wikipedia.org/wiki/%C3%9F ) as a possible confusable with the latin capital "B". This is a fairly obvious confusable, and the wikipedia article even mentions "Not to be confused with the Latin letter B." at the top of the article. Would it be possible to add this to the official Unicode confusables data mapping ? Many thanks,
From: PANDI ID Registry
Sent: Monday, December 14, 2020, 12:22:02 AM PST
Subject: Re: PANDI Inquiries
We would like to know how can we increase the status of the Javanese script from limited use to recommended? should we send more evidence that the script are still actively being used by the community? because we needed it As soon as possible for our IDN process to ICANN. Thank you so much. waiting for your reply Best regards, Alicia Nabilla Business Development PANDI .id Registry
Date/Time: Tue Dec 15 13:25:44 CST 2020
Name: Zach Lym
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Normalization Generics (NFx, NFKx, NFxy)
I have been tracking down the rationale behind the normalization choices in filesystems. One problem area is the misleading use of strict logician terminology. Take the definition of Unicode's caseless matching algorithm [D145]: > A string X is a canonical caseless match for a string Y if and only if: > NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y))) The W3C Canonical Case Fold Normalization algorithm is defined as being compatible with [D145], but uses NFC in the last step [w3c-charmod-norm], leading to an apparent contradiction. Even though Unicode explains that "case folding is closed under canonical normalization" it took me a long time to find that passage and convince myself that the W3C and Unicode matching algorithms are equivalent. I am not alone: Linux kernel hackers couldn't figure it out either [linux-norm]! >> Is there any case where >> NFC(x) == NFC(y) && NFD(x) != NFD(y) , or >> NFC(x) != NFC(y) && NFD(x) == NFD(y) > >This is good question. And I think we should get definite answer for it prior inclusion of normalization into kernel. I was originally going to propose additions to D145 textual description, cross-references to the implementation section, and adding discussion of W3C charmod-norm. However, I don't think this would help as the text is already quite dense and most people will just ignore everything outside the example anyway [minimalist-manual]. I would instead like to propose normalization form generics for use in pseudo code definitions: NFx = NFD|NFC //NFx != NFy NFKx = NFKD|NFKC NFxy = NFD|NFC|NFKD|NFKC Freestanding `X`/`Y` variables should be probably be replaced to disambiguate them from the `NFx` nomenclature. `s1`/`s2` would work but `foo`/`bar` is less dense: NFx(caseFold(NFD(foo))) = NFx(caseFold(NFD(bar))) `NFx` does not currently appear within the Unicode standard itself, but is used in the normalization technical note [UAX15]. However, **UAX15 defines `NFx` twice**, first as NFD|NFC|NFKD|NFKC and later on as NFD|NFC. I think the proposed convention gets the most mileage out of the nomenclature and is how I have seen `NFx` used in the real world [linus]. Thank you! -Zach Lym [w3c-charmod-norm]: https://w3c.github.io/charmod-norm/#CanonicalFoldNormalizationStep [linux-norm]: https://lwn.net/ml/linux-fsdevel/20190206084752.nwjkeiixjks34vao@pali/ [minimalist-manual]: https://dl.acm.org/doi/10.1207/s15327051hci0302_2 [UAX15]: https://unicode.org/reports/tr15/ [linus]: https://lore.kernel.org/linux-fsdevel/CAHk-=wiFtZL5rK3T-HQPm0oG4vekDJEKS47P8BbzHSXt_6SHuA@mail.gmail.com/
Date/Time: Fri Jan 8 03:52:33 CST 2021
Name: Alicia
Report Type: Error Report
Opt Subject: Javanese Script on table 7, should be on table 5
Dear UNICODE, we are PANDI, a new association member on UNICODE, we registered in the first place to endeavour Indonesian Scripts to be able to be use on digital platforms. starting with Javanese script and having it appear on the table 7 confused us as it is being used widely even in the digital platform, these are some of the websites evidences: ꦄꦁꦒꦿꦲꦺꦤꦶꦱ꧀ꦩꦼ.id ꦱꦌ.id ꦗꦮ.id ꦎꦗ꦳ꦏ꧀ꦮꦶꦏ꧀.id ꦱꦶꦤꦲꦸꦗꦮ.id ꦗꦒꦢ꧀ꦗꦮ.id ꦯꦿꦶꦠꦤ꧀ꦗꦸꦁ.id ꦱꦗ.id ꧖ꦤ꧀ꦢꦃꦮꦶꦢꦾꦱ꧀ꦠꦶ.id ꦒꦼꦒꦸꦫꦶꦠ꧀ꦠꦤ꧀.id ꦤꦮꦏ꧀ꦱꦫ.id ꦱꦸꦮꦸꦁ.id ꦒꦺꦁꦏꦺꦴꦧꦿ.id ꦮꦺꦴꦁꦠꦹꦫ꧀ꦪꦾꦤ꧀.id ꦥꦮ꧀ꦮꦂꦯꦴꦱ꧀ꦠꦿ.id ꦨꦮꦟꦯꦴꦱ꧀ꦠꦿ.id ꦱꦼꦒꦗꦧꦸꦁ.id ꦩꦠꦼꦩꦠꦶꦏꦏꦸ.id ꦱꦼꦂꦧꦱꦼꦂꦧꦶꦗꦮ.id ꦥꦺꦴꦗꦺꦴꦏ꧀ꦫꦗ.id ꦄꦩꦫꦱꦸꦂꦪꦩꦤ꧀ꦝꦭ.id ꦤꦪꦏ.id ꦱ꧀ꦮꦫꦏ꧀ꦱꦫ.id ꦢꦾꦃꦝꦶꦝꦶꦤ꧀.id ꦥꦚ꧀ꦗꦼꦧꦂꦱꦼꦩꦔꦠ꧀.id ꦮꦶꦢꦾꦏ꧀ꦱꦫ.id ꦗꦮꦲꦺꦴꦏꦺ.id ꦥꦿꦲꦱꦶꦠ.id ꦮꦤꦸꦃꦗꦒꦢ꧀ꦗꦮ.id ꦲꦤꦕꦫꦏ.id ꦄꦟ꧀ꦤꦏꦟ꧀ꦛꦶ.id ꦲꦏ꧀ꦱꦫꦥꦺꦤꦶ.id ꦥꦫꦩꦠꦠ꧀ꦮ.id ꦲꦪꦸꦄꦏ꧀ꦱꦫ.id ꦕꦕꦫꦏꦤ꧀ꦫꦶꦧꦵꦤ꧀.id ꦩꦼꦤꦶꦁ.id ꦲꦢꦶꦥꦿꦩꦟ.id ꦔ꧀ꦭꦼꦱ꧀ꦠꦫꦶꦄꦏ꧀ꦱꦫꦗꦮ.id ꦏꦸꦤ꧀ꦝꦏꦧꦸꦢꦪꦤ꧀.id ꦠꦽꦔ꧀ꦒꦶꦤ꧀ꦤꦱ꧀.id ꦥ꦳ꦺꦧꦿꦶꦩꦸꦲ꧀ꦤꦱ꧀.id ꦄꦢꦶꦱꦤ꧀ꦠꦫ.id ꦮꦼꦤꦶꦁꦤꦶꦱ꧀ꦮꦫ.id ꦕꦫꦏ.id ꦥꦸꦱ꧀ꦠꦏ.id Other than that, pointing to https://www.unicode.org/reports/tr31/ where Javanese is listed as 'Limited Used scripts' (table 7) when it should be on the table 5 (Recommended scripts) based on the Iso 10646 evidences. if there are also more information about what can we input to give more evidence please do mention on your answer. Best Regards, Alicia Nabilla PANDI .id-Registry Icon Business Park, LT1-LT2 Cisauk, BSD, Tangerang, Indonesia.
Date/Time: Sun Oct 25 20:48:14 CDT 2020
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: Regarding the proposed gender variants of U+1F930 PREGNANT WOMAN
The pipeline was recently updated to include two new emoji candidates that function as gender variants of U+1F930 🤰 PREGNANT WOMAN. Bizarrely, however, these two candidates violate a number of well‐established emoji conventions for no apparent reason: • The two emoji are proposed as atomic characters, even though they are merely gender variants of an already existing emoji, which have universally and consistently been encoded as ZWJ sequences for the past four years. • The male and neutral forms are proposed as new additions with the existing de‐facto female form remaining unchanged, but so far the practice has always been to redefine the old emoji character as neutral and define new male and female variants thereof, even in cases where the base character’s name strongly implies a certain gender (cf. U+1F473 👳 MAN WITH TURBAN, U+1F46F 👯 WOMAN WITH BUNNY EARS, and several other examples). • The proposed names of the new characters, “man with swollen belly” and “person with swollen belly”, are completely semantically detached from the meaning of U+1F930, which is never the case for emoji that form a gender triplet. Being pregnant and having a swollen belly are not synonymous; one cannot reasonably be used as a substitute for the other. While it is true that U+1F930 is sometimes humorously used to convey a general concept of bloat, this has no bearing on its actual semantics as a Unicode character. U+1F930 was encoded for a very particular purpose – to represent pregnancy and parenthood – and retroactively changing its official meaning to encompass any stomach bloat would be both disrespectful to expecting parents and damaging to existing data. I propose that the following steps be taken: • Remove the provisional characters *U+1FAC3 MAN WITH SWOLLEN BELLY and *U+1FAC4 PERSON WITH SWOLLEN BELLY from the pipeline. • For Emoji 14.0, add two new ZWJ sequences (and their accompanying Fitzpatrick‐type variants) as candidates: ◦ <U+1F930, U+200D, U+2642, U+FE0F> 🤰♂️ “Pregnant Man” ◦ <U+1F930, U+200D, U+2640, U+FE0F> 🤰♀️ “Pregnant Woman” • For Emoji 14.0, change the CLDR short name of U+1F930 to “Pregnant Person”. If the UTC deems a generic “person with swollen belly” emoji that has no direct connection to pregnancy necessary, then such character must be encoded separately with its own gender variant ZWJ sequences as is always done. Repurposing U+1F930 for this would be nonsensical and arbitrary. Compare this case to the addition of the new bottle‐feeding emoji from this year’s release, which left the existing U+1F931 🤱 BREAST-FEEDING untouched rather than altering its established meaning.
Date/Time: Tue Nov 24 21:08:58 CST 2020
Name: Karl Williamson
Report Type: Error Report
Opt Subject: Errors in tr18
I notice that it still says in section 0 "There are three fundamental levels of Unicode support that can be offered by regular expression engines:" The third level has been removed, and is not included in the list of two that immediately follows that line. I tried perl's implementation of TUS 13.0 out on the pattern in section 2.6 \p{name=/VARIA(TION|NT)/} Perl gave more results than you show, which makes me wonder about your implementation. The ones it found missing from yours are PAU CIN HAU GLOTTAL STOP VARIANT CUNEIFORM NUMERIC SIGN FOUR U VARIANT FORM CUNEIFORM NUMERIC SIGN FIVE U VARIANT FORM CUNEIFORM NUMERIC SIGN SIX U VARIANT FORM CUNEIFORM NUMERIC SIGN SEVEN U VARIANT FORM CUNEIFORM NUMERIC SIGN EIGHT U VARIANT FORM CUNEIFORM NUMERIC SIGN NINE U VARIANT FORM
Date/Time: Mon Dec 21 19:01:07 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Advance width of Lisu tone letters
Chapter 18, in the section on Lisu, says “each tone letter is typeset on an em-square, including those whose visual appearance consists of two marks.” I think this is a misunderstanding of L2/08-019, which says “Every simple tone letter should fit into a single em square.” Fitting within an em square is different from necessarily having an advance of one em. The Lisu font used in the text of the proposal and in many of the figures is proportional. Therefore the core standard should not imply that Lisu letters must be one em wide.
Date/Time: Wed Dec 23 15:27:24 CST 2020
Name: David Corbett
Report Type: Error Report
Opt Subject: Edge case for Syriac shaping
The Syriac shaping rules S1, S2, and S3 apply to alaph before non-joining characters. They should also apply at the end of text.
(None at this time.)