L2/21-011

Comments on Public Review Issues
(September 23, 2020 - January 8, 2021)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 8, 2021, since the previous cumulative document was issued prior to UTC #165 (October 2020).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of January 7, 2021.

426 Proposed Update UTR #53, Unicode Arabic Mark Rendering (feedback)
425 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback at this time
424 Proposed Update UAX #31 Unicode Identifier and Pattern Syntax (feedback) No feedback at this time
423 Proposed Update UTS #39 Unicode Security Mechanisms (feedback)
422 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback) No feedback at this time
421 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback) No new feedback for UTC #166
420 Proposed Update UAX #45, U-source Ideographs (feedback) No feedback at this time
419 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time
417 Proposed Update UAX #29, Unicode Text Segmentation (feedback) No feedback at this time
416 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time
415 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time
408 QID Emoji (feedback)

The links below go to locations in this document for feedback.

Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to Properties & Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports

 


Feedback routed to Unihan ad hoc for evaluation

Date/Time: Thu Jan 7 15:18:08 CST 2021
Name: William T. Nelson
Report Type: Error Report
Opt Subject: UAX38 kMandarin readings for two ideographs

U+7B7D 筽 has kMandarin value o which is its Korean reading. Please change
the value to wú as per 两万汉字中日韩越英俄读音释义字典 page 846 entry 17174.

U+9730 霰 has kMandarin value sǎn, but the correct reading is xiàn according
to character dictionaries. (The PRI #297 feedback page has an error report
from Markus Scherer regarding this value.)

Feedback routed to Script ad hoc for evaluation

Date/Time: Wed Sep 30 10:29:11 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/20-249

The figures from Safi­-Nezhad 2008 have examples of stacked numbers with different currency 
denominations. For example, the first row of figure 9 shows “1 toman” above “1,111 dinar”. 
How should these be encoded?

How should “1,000” in figure 54 be encoded?

Some numbers in figures 53 and 54 end with what looks like PERSIAN SIYAQ KHARVAR MARK with 
its components swapped. How should that be encoded?

Some of the kharvar amounts in figures 44 and thereafter end with what looks like just the 
dot of PERSIAN SIYAQ KHARVAR MARK. How should that be encoded?

Figure 56 shows Indic siyaq instead of Persian siyaq.

Date/Time: Wed Sep 30 19:49:25 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comment on L2/20-247

A third option is to adopt a new rule that a composed code point (e.g. U+09CB BENGALI 
VOWEL SIGN O) may be the base of a variation sequence if and only if its decomposed 
trailing code point (e.g. U+09BE BENGALI VOWEL SIGN AA) is also the base of a variation 
sequence, and those two variation sequences are harmonized to represent essentially 
the same variation.

Date/Time: Thu Oct 1 21:02:53 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/20-187R and L2/20-188R

In the Elbasan block, the Albanian letters “e” and “ë” are ASCIIfied 
as EI and E. The Vithkuqi and Todhri proposals ASCIIfy them as E and 
EH. Is this inconsistency intentional?

Date/Time: Fri Oct 2 20:49:07 CDT 2020
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Kyrgyz som observation


I had submitted a proposal to include some currency symbols 
(https://www.unicode.org/L2/L2019/19291-missing-currency.pdf) The proposal 
was not accepted because I did not provide sufficient evidence of the use 
of the symbols. Since then, the national bank of Kyrgyzstan has submitted 
his own proposal with the sufficient evidence (https://www.unicode.org/L2/L2020/20261-kyrgyz-som.pdf). 
As expected I wasn't mentioned in the proposal, but it was expected since I can 
expect the national bank to read some year old proposals.

I write to reinstate my opinion, that the name "KYRGYZ SOM SIGN" is not 
preferable, since there is a possibility that Uzbekistan would also adopt it, 
given the greater friendship between countries and identical names for their currencies.
The more generic name "SOM SIGN" would fit the current pattern of recently 
added currency signs and would ensure that Uzbekitanis would feel welcome 
to adopt the sign if they so choose.
I do not have any allegiance to either country (I live in Mexico), I just 
think that it's a wise choice to think on the long term, given the fact that 
character names cannot be changed.

Of course this is only a suggestion, as the National Bank clearly has the higher 
authority on this, and may dismiss me outright.

Date/Time: Mon Oct 5 12:49:03 CDT 2020
Name: Fawaz Ahmed
Report Type: Other Question, Problem, or Feedback
Opt Subject: 06E0 Character has wrong description

The Unicode document at https://www.unicode.org/charts/PDF/U0600.pdf ,
seems to described 06E0 as rectangular zero, but it should be described as '
circular zero.

Thanks

Date/Time: Mon Oct 12 13:50:38 CDT 2020
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/19-306

The proposed ARABIC LETTER THIN YEH has its own joining group, THIN YEH.
What does the final thin yeh look like? Does it have two dots or zero? The
normal yeh shown in the King Fahd Warsh examples of this proposal has zero
dots in its final form, meaning U+06CC ARABIC LETTER FARSI YEH is the
normal, non-thin yeh in this context. It would therefore make sense if the
thin yeh also had zero dots in its final form, in which case it should be
called ARABIC LETTER THIN FARSI YEH. Maybe it has no attested final form,
but fonts are still going to have handle that case, so Unicode should
provide some guidance.

If ARABIC BASELINE ROUND DOT has gc=Lo why does ARABIC RAISED ROUND DOT have
gc=Sk? Surely if one is a letter so is the other; cf. U+0674 ARABIC LETTER
HIGH HAMZA.

https://app.quranflash.com/book/Warsh2?en#/reader/chapter/565 (for example)
in the right margin shows ARABIC LARGE ROUND DOT ABOVE behaving like ARABIC
HAMZA ABOVE. The hamza is in UTR #53’s Modifier Combining Marks set; so
should the dot be. This most likely goes for ARABIC LARGE ROUND DOT BELOW
too.

Date/Time: Thu Nov 26 10:22:07 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Diacritics in Old Hungarian

The ad hoc report L2/11-242R recommends using U+1DC4 COMBINING MACRON-ACUTE
for a certain mark attested in Old Hungarian. Neither the core standard nor
the code chart for Combining Diacritical Marks Supplement mentions this
usage. Should U+1DC4 be used in Old Hungarian? If not, how should it be
represented?

Other diacritics have seen some use as part of the modern revival of Old
Hungarian, though not part of the most common version in use today. These
include what appear to be U+0304 COMBINING MACRON, U+0307 COMBINING DOT
ABOVE, and U+0301 COMBINING ACUTE ACCENT. With what code points should these
diacritics be represented?

Date/Time: Thu Nov 26 10:40:35 CST 2020
Name: David Corbett
Report Type: Error Report
Opt Subject: Isolated U+08AC ARABIC LETTER ROHINGYA YEH

The core standard says that U+08AC ARABIC LETTER ROHINGYA YEH has no
isolated form, but it may exist. See
<https://github.com/googlei18n/noto-fonts/issues/1266>.

Date/Time: Fri Dec 4 21:19:30 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Rendering U+1D9FF SIGNWRITING HEAD with forehead marks

Some SignWriting marks are placed on the forehead. To avoid overlapping the
top part of U+1D9FF SIGNWRITING HEAD, the top part of U+1D9FF is omitted.
This is encoded as <U+1D9FF, U+1DA9B> (SignWriting ID
04-01-001-01-02-01). The corresponding symbols in ISWA 2010
(http://www.signbank.org/iswa/) are drawn with the top part of the head
omitted; an example is 04-01-003-01-04-03. Should 04-01-003-01-04-03 be
encoded as <U+1D9FF, U+1DA01, U+1DA9D, U+1DAA2> or as <U+1D9FF,
U+1DA9B, U+1DA01, U+1DA9D, U+1DAA2>? The recently released Noto Sans
SignWriting renders both with the top of the head omitted, but it might be
preferable to be more explicit and render the two sequences distinctly.

Date/Time: Mon Dec 7 19:21:59 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Rendering SignWriting symbols without valid fill-1

According to chapter 21, in the section for SignWriting, “There are no explicit 
modifiers encoded for fill-1 or rotation-1, as those values are considered inherent 
in the base character”. However, there are some characters for which fill-1 is not 
valid, such as U+1D8F6 SIGNWRITING HAND-FIST THUMB HEEL. How should such characters 
be rendered when not followed by a valid fill modifier?

From: PANDI ID Registry
Sent: Monday, December 14, 2020, 12:22:02 AM PST
Subject: Re: PANDI Inquiries

We would like to know how can we increase the status of the Javanese script from limited use to recommended?
should we send more evidence that the script are still actively being used by the community?
because we needed it As soon as possible for our IDN process to ICANN.

Thank you so much.
waiting for your reply

Best regards,
Alicia Nabilla
Business Development
PANDI .id Registry

Date/Time: Sun Dec 20 09:57:01 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Mandaic kad

Chapter 9 says “There are two ways to represent kad in Mandaic: 
U+0857 MANDAIC LETTER KAD or the sequence <U+084A MANDAIC LETTER AK, 
U+0856 MANDAIC LETTER DUSHENNA>.” Do these two ways mean the same thing? 
Are they rendered identically? If they are identical, which one should people use?

Feedback routed to Properties & Algorithms ad hoc for evaluation

Date/Time: Wed Sep 23 12:23:38 CDT 2020
Name: Wes
Report Type: Error Report
Opt Subject: Unicode confusables data missing Sharp-S letter to Capital-B confusable

Hi, 

I hope you're all doing well. 

It seems the confusables at ftp://ftp.unicode.org/Public/security/latest/confusables.txt  
failed to include the German sharp-S or Eszett letter ( https://en.wikipedia.org/wiki/%C3%9F ) 
as a possible confusable with the latin capital "B". 
This is a fairly obvious confusable, and the wikipedia article even mentions 
"Not to be confused with the Latin letter B." at the top of the article. 

Would it be possible to add this to the official Unicode confusables data mapping ? 

Many thanks, 

From: PANDI ID Registry
Sent: Monday, December 14, 2020, 12:22:02 AM PST
Subject: Re: PANDI Inquiries

We would like to know how can we increase the status of the Javanese script from limited use to recommended?
should we send more evidence that the script are still actively being used by the community?
because we needed it As soon as possible for our IDN process to ICANN.

Thank you so much.
waiting for your reply

Best regards,
Alicia Nabilla
Business Development
PANDI .id Registry

Date/Time: Tue Dec 15 13:25:44 CST 2020
Name: Zach Lym
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Normalization Generics (NFx, NFKx, NFxy)

I have been tracking down the rationale behind the normalization choices in
filesystems.  One problem area is the misleading use of strict logician
terminology.  Take the definition of Unicode's caseless matching algorithm
[D145]:

> A string X is a canonical caseless match for a string Y if and only if:
> NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))

The W3C Canonical Case Fold Normalization algorithm is defined as being
compatible with [D145], but uses NFC in the last step [w3c-charmod-norm],
leading to an apparent contradiction.  Even though Unicode explains that
"case folding is closed under canonical normalization" it took me a long
time to find that passage and convince myself that the W3C and Unicode
matching algorithms are equivalent.  I am not alone: Linux kernel hackers
couldn't figure it out either [linux-norm]!

>> Is there any case where
>>    NFC(x) == NFC(y) && NFD(x) != NFD(y)   , or
>>    NFC(x) != NFC(y) && NFD(x) == NFD(y)
>
>This is good question. And I think we should get definite answer for it prior inclusion of normalization into kernel.

I was originally going to propose additions to D145 textual description,
cross-references to the implementation section, and adding discussion of W3C
charmod-norm.  However, I don't think this would help as the text is already
quite dense and most people will just ignore everything outside the example
anyway [minimalist-manual].

I would instead like to propose normalization form generics for use in pseudo code definitions:

    NFx = NFD|NFC //NFx != NFy
    NFKx = NFKD|NFKC
    NFxy = NFD|NFC|NFKD|NFKC
    
Freestanding `X`/`Y` variables should be probably be replaced to
disambiguate them from the `NFx` nomenclature.  `s1`/`s2` would work but
`foo`/`bar` is less dense:

    NFx(caseFold(NFD(foo))) = NFx(caseFold(NFD(bar)))

`NFx` does not currently appear within the Unicode standard itself, but is
used in the normalization technical note [UAX15].  However, **UAX15 defines
`NFx` twice**, first as NFD|NFC|NFKD|NFKC and later on as NFD|NFC.  I think
the proposed convention gets the most mileage out of the nomenclature and is
how I have seen `NFx` used in the real world [linus].

Thank you!
-Zach Lym

[w3c-charmod-norm]: https://w3c.github.io/charmod-norm/#CanonicalFoldNormalizationStep 
[linux-norm]: https://lwn.net/ml/linux-fsdevel/20190206084752.nwjkeiixjks34vao@pali/ 
[minimalist-manual]: https://dl.acm.org/doi/10.1207/s15327051hci0302_2 
[UAX15]: https://unicode.org/reports/tr15/ 
[linus]: https://lore.kernel.org/linux-fsdevel/CAHk-=wiFtZL5rK3T-HQPm0oG4vekDJEKS47P8BbzHSXt_6SHuA@mail.gmail.com/ 

Date/Time: Fri Jan 8 03:52:33 CST 2021
Name: Alicia
Report Type: Error Report
Opt Subject: Javanese Script on table 7, should be on table 5

Dear UNICODE,

we are PANDI, a new association member on UNICODE, we registered in the 
first place to endeavour Indonesian Scripts to be able to be use on digital 
platforms. starting with Javanese script and having it appear on the table 7 
confused us as it is being used widely even in the digital platform, 
these are some of the websites evidences:

ꦄꦁꦒꦿꦲꦺꦤꦶꦱ꧀ꦩꦼ.id
ꦱꦌ.id
ꦗꦮ.id
ꦎꦗ꦳ꦏ꧀ꦮꦶꦏ꧀.id
ꦱꦶꦤꦲꦸꦗꦮ.id	
ꦗꦒꦢ꧀ꦗꦮ.id	
ꦯꦿꦶꦠꦤ꧀ꦗꦸꦁ.id	
ꦱꦗ.id	
꧖ꦤ꧀ꦢꦃꦮꦶꦢꦾꦱ꧀ꦠꦶ.id	
ꦒꦼꦒꦸꦫꦶꦠ꧀ꦠꦤ꧀.id	
ꦤꦮꦏ꧀ꦱꦫ.id	
ꦱꦸꦮꦸꦁ.id	
ꦒꦺꦁꦏꦺꦴꦧꦿ.id  	
ꦮꦺꦴꦁꦠꦹꦫ꧀ꦪꦾꦤ꧀.id	
ꦥꦮ꧀ꦮꦂꦯꦴꦱ꧀ꦠꦿ.id	
ꦨꦮꦟꦯꦴꦱ꧀ꦠꦿ.id	
ꦱꦼꦒꦗꦧꦸꦁ.id	
ꦩꦠꦼꦩꦠꦶꦏꦏꦸ.id	
ꦱꦼꦂꦧꦱꦼꦂꦧꦶꦗꦮ.id	
ꦥꦺꦴꦗꦺꦴꦏ꧀ꦫꦗ.id	
ꦄꦩꦫꦱꦸꦂꦪꦩꦤ꧀ꦝꦭ.id	
ꦤꦪꦏ.id	
ꦱ꧀ꦮꦫꦏ꧀ꦱꦫ.id	
ꦢꦾꦃꦝꦶꦝꦶꦤ꧀.id	
ꦥꦚ꧀ꦗꦼꦧꦂꦱꦼꦩꦔꦠ꧀.id	
ꦮꦶꦢꦾꦏ꧀ꦱꦫ.id	
ꦗꦮꦲꦺꦴꦏꦺ.id	
ꦥꦿꦲꦱꦶꦠ.id	
ꦮꦤꦸꦃꦗꦒꦢ꧀ꦗꦮ.id	
ꦲꦤꦕꦫꦏ.id	
ꦄꦟ꧀ꦤꦏꦟ꧀ꦛꦶ.id	
ꦲꦏ꧀ꦱꦫꦥꦺꦤꦶ.id	
ꦥꦫꦩꦠꦠ꧀ꦮ.id	
ꦲꦪꦸꦄꦏ꧀ꦱꦫ.id	
ꦕꦕꦫꦏꦤ꧀‌ꦫꦶꦧꦵꦤ꧀.id	
ꦩꦼꦤꦶꦁ.id	
ꦲꦢꦶꦥꦿꦩꦟ.id	
ꦔ꧀ꦭꦼꦱ꧀ꦠꦫꦶꦄꦏ꧀ꦱ‌ꦫꦗꦮ.id	
ꦏꦸꦤ꧀ꦝꦏꦧꦸꦢꦪꦤ꧀.id	
ꦠꦽꦔ꧀ꦒꦶꦤ꧀ꦤꦱ꧀.id	
ꦥ꦳ꦺꦧꦿꦶꦩꦸꦲ꧀ꦤꦱ꧀.id	
ꦄꦢꦶꦱꦤ꧀ꦠꦫ.id	
ꦮꦼꦤꦶꦁꦤꦶꦱ꧀ꦮꦫ.id	
ꦕꦫꦏ.id	
ꦥꦸꦱ꧀ꦠꦏ.id	

Other than that, pointing to https://www.unicode.org/reports/tr31/ where Javanese 
is listed as 'Limited Used scripts' (table 7) when it should be on the table 5 
(Recommended scripts) based on the Iso 10646 evidences. if there are also more 
information about what can we input to give more evidence please do mention on your answer.

Best Regards,
Alicia Nabilla
PANDI .id-Registry
Icon Business Park, LT1-LT2 Cisauk, BSD, Tangerang, Indonesia.

Feedback routed to Emoji SC for evaluation

Date/Time: Sun Oct 25 20:48:14 CDT 2020
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: Regarding the proposed gender variants of U+1F930 PREGNANT WOMAN

The pipeline was recently updated to include two new emoji candidates that
function as gender variants of U+1F930 🤰 PREGNANT WOMAN. Bizarrely, however,
these two candidates violate a number of well‐established emoji conventions
for no apparent reason:

• The two emoji are proposed as atomic characters, even though they are
merely gender variants of an already existing emoji, which have
universally and consistently been encoded as ZWJ sequences for the past
four years.

• The male and neutral forms are proposed as new additions with the existing
de‐facto female form remaining unchanged, but so far the practice has
always been to redefine the old emoji character as neutral and define new
male and female variants thereof, even in cases where the base character’s
name strongly implies a certain gender (cf. U+1F473 👳 MAN WITH TURBAN,
U+1F46F 👯 WOMAN WITH BUNNY EARS, and several other examples).

• The proposed names of the new characters, “man with swollen belly” and
“person with swollen belly”, are completely semantically detached from the
meaning of U+1F930, which is never the case for emoji that form a gender
triplet. Being pregnant and having a swollen belly are not synonymous; one
cannot reasonably be used as a substitute for the other. While it is true
that U+1F930 is sometimes humorously used to convey a general concept of
bloat, this has no bearing on its actual semantics as a Unicode character.
U+1F930 was encoded for a very particular purpose – to represent pregnancy
and parenthood – and retroactively changing its official meaning to
encompass any stomach bloat would be both disrespectful to expecting
parents and damaging to existing data.

I propose that the following steps be taken:

• Remove the provisional characters *U+1FAC3 MAN WITH SWOLLEN BELLY and 
*U+1FAC4 PERSON WITH SWOLLEN BELLY from the pipeline.

• For Emoji 14.0, add two new ZWJ sequences (and their accompanying 
Fitzpatrick‐type variants) as candidates:

	◦ <U+1F930, U+200D, U+2642, U+FE0F> 🤰‍♂️ “Pregnant Man”
	◦ <U+1F930, U+200D, U+2640, U+FE0F> 🤰‍♀️ “Pregnant Woman”
	
• For Emoji 14.0, change the CLDR short name of U+1F930 to “Pregnant Person”.

If the UTC deems a generic “person with swollen belly” emoji that has no
direct connection to pregnancy necessary, then such character must be
encoded separately with its own gender variant ZWJ sequences as is always
done. Repurposing U+1F930 for this would be nonsensical and arbitrary.
Compare this case to the addition of the new bottle‐feeding emoji from this
year’s release, which left the existing U+1F931 🤱 BREAST-FEEDING untouched
rather than altering its established meaning.

Feedback routed to Editorial Committee for evaluation

Date/Time: Tue Nov 24 21:08:58 CST 2020
Name: Karl Williamson
Report Type: Error Report
Opt Subject: Errors in tr18

I notice that it still says in section 0
"There are three fundamental levels of Unicode support that can be 
offered by regular expression engines:"

The third level has been removed, and is not included in the list of 
two that immediately follows that line.

I tried perl's implementation of TUS 13.0 out on the pattern in section 2.6

\p{name=/VARIA(TION|NT)/}

Perl gave more results than you show, which makes me wonder about your implementation.

The ones it found missing from yours are

 PAU CIN HAU GLOTTAL STOP VARIANT
 CUNEIFORM NUMERIC SIGN FOUR U VARIANT FORM
 CUNEIFORM NUMERIC SIGN FIVE U VARIANT FORM
 CUNEIFORM NUMERIC SIGN SIX U VARIANT FORM
 CUNEIFORM NUMERIC SIGN SEVEN U VARIANT FORM
 CUNEIFORM NUMERIC SIGN EIGHT U VARIANT FORM
 CUNEIFORM NUMERIC SIGN NINE U VARIANT FORM

Date/Time: Mon Dec 21 19:01:07 CST 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Advance width of Lisu tone letters

Chapter 18, in the section on Lisu, says “each tone letter is typeset on an
em-square, including those whose visual appearance consists of two marks.” I
think this is a misunderstanding of L2/08-019, which says “Every simple tone
letter should fit into a single em square.” Fitting within an em square is
different from necessarily having an advance of one em. The Lisu font used
in the text of the proposal and in many of the figures is proportional.
Therefore the core standard should not imply that Lisu letters must be one
em wide.

Date/Time: Wed Dec 23 15:27:24 CST 2020
Name: David Corbett
Report Type: Error Report
Opt Subject: Edge case for Syriac shaping

The Syriac shaping rules S1, S2, and S3 apply to alaph before non-joining characters. 
They should also apply at the end of text.


Other Reports

(None at this time.)