Comments on Public Review Issues

L2/24-063

Comments on Public Review Issues
(January 8 - April 3, 2024)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 03, 2024, since the previous cumulative document was issued prior to UTC #178 (January 08, 2024).

Issue Name Feedback Link

500 Draft UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet) (feedback)

499 Proposed Update UAX #41, Common References for Unicode Standard Annexes (feedback) No feedback at this time

498 Unicode Emoji 16.0 Alpha Repertoire (feedback)

497 Unicode 16.0 Alpha Review (feedback)

496 Proposed Update UTS #51, Unicode Emoji (feedback)

494 Proposed Update UAX #29, Unicode Text Segmentation (feedback)

492 Proposed Update UTS #39, Unicode Security Mechanisms (feedback) No feedback at this time

491 Proposed Update UAX #31 Unicode Identifier and Pattern Syntax (feedback)

490 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time

489 Proposed Update UAX #44, Unicode Character Database (feedback)

488 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback at this time

487 Proposed Update UAX #53, Unicode Arabic Mark Rendering (feedback) No feedback at this time

486 Stabilization of UAX #42, Unicode Character Database in XML (UCDXML) (feedback)

485 Draft UTR #56, Unicode Cuneiform Sign Lists (feedback) No feedback at this time

484 Proposed Update UAX #50, Unicode Vertical Text Layout (feedback) No feedback at this time

483 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)

Issue	Name	Feedback Link
500	Draft UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)	(feedback)
499	Proposed Update UAX #41, Common References for Unicode Standard Annexes	(feedback) No feedback at this time
498	Unicode Emoji 16.0 Alpha Repertoire	(feedback)
497	Unicode 16.0 Alpha Review	(feedback)
496	Proposed Update UTS #51, Unicode Emoji	(feedback)
494	Proposed Update UAX #29, Unicode Text Segmentation	(feedback)
492	Proposed Update UTS #39, Unicode Security Mechanisms	(feedback) No feedback at this time
491	Proposed Update UAX #31 Unicode Identifier and Pattern Syntax	(feedback)
490	Proposed Update UAX #14, Unicode Line Breaking Algorithm	(feedback) No feedback at this time
489	Proposed Update UAX #44, Unicode Character Database	(feedback)
488	Proposed Update UTS #10, Unicode Collation Algorithm	(feedback) No feedback at this time
487	Proposed Update UAX #53, Unicode Arabic Mark Rendering	(feedback) No feedback at this time
486	Stabilization of UAX #42, Unicode Character Database in XML (UCDXML)	(feedback)
485	Draft UTR #56, Unicode Cuneiform Sign Lists	(feedback) No feedback at this time
484	Proposed Update UAX #50, Unicode Vertical Text Layout	(feedback) No feedback at this time
483	Proposed Update UAX #38, Unicode Han Database (Unihan)	(feedback)

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]
Feedback routed to Script Encoding Working Group for evaluation [SAH]
Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]
Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]
Feedback routed to Editorial Working Group for evaluation [EDC]
Other Reports

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]

Date/Time: Fri Jan 12 18:59:25 CST 2024
ReportID: ID20240112185925
Name: Paul Masson
Report Type: Error Report
Opt Subject: kMandarin for U+26760 𦝠 and U+21FEA 𡿪

Both of these characters have two different pronunciations, neither of which
is listed for each one. It is unclear to me what the definitive
pronunciations should be for each, but there should at least be some entry
in the database.

Date/Time: Thu Feb 01 12:35:49 CST 2024
ReportID: ID20240201123549
Name: Lee Collins
Report Type: Error Report
Opt Subject: Unihan_Readings.txt

Definition of U+4AB3 is grammatically incorrect.

Current (v 15.0):

U+4AB3	kDefinition	slanted face causing by the paralyzed of the facial nerve

Better:

U+4AB3	kDefinition	slanted face caused by paralysis of the facial nerve

Date/Time: Fri Mar 08 07:11:14 CST 2024
ReportID: ID20240308071114
Name: Ken Lunde
Report Type: Error Report
Opt Subject: UAX #45

While compiling the candidates for the UTC’s submission for IRG Working Set
2024, which included preparing the metadata, I came across a small number
of attributes that should be changed to the following UAX #45 records:

UTC-00358: First Stroke = 1; Total Strokes = 10
UTC-00373: First Stroke = 3; Total Strokes = 17
UTC-00792: Variant = U+8F66; First Stroke = 0; Total Strokes = 4; UTCDoc (add) = L2/21-149 1
UTC-00910: First Stroke = 5
UTC-01257: IDS = ⿰氵𣫑
UTC-03154: IDS = ⿵輿𰀁
UTC-03247: IDS = ⿱艹𢑑
UTC-03248: UTCDoc = L2/21-149 2
UTC-03254: IDS = ⿺⿺見見⿿⿺見見
UTC-03257: IDS = ⿰汖攵
UTC-03268: Stroke Count = 7; Total Strokes = 15; IDS = ⿰金⿸厂㐱
UTC-03273: IDS = ⿱雨濩
UTC-03277: IDS = ⿱艹𣏹
UTC-03279: IDS = ⿰木扆
UTC-03374: IDS = ⿰木⿱止⿱丷八
UTC-03380: IDS = ⿱艹廣
UTC-03392: IDS = ⿱艹𦘱

For UTC-03268, the attribute changes are related to the need to change the
representative glyph to match new evidence, specifically that the 彡
component should instead be 㐱.

In addition, the IDS for the following UAX #45 ideograph that is in IRG
Working Set 2021 should be changed from ⿳艹大雨 to ⿰艹⿵大雨 to match. See:

https://hc.jsecs.org/irg/ws2021/app/index.php?id=03388 

That is all.

Date/Time: Sat Mar 16 07:50:59 CDT 2024
ReportID: ID20240316075059
Name: Ken Lunde
Report Type: Error Report
Opt Subject: UAX #45 USourceData.txt

Per the following post, 𝕏 user @Kesuuko_0826 indicated that UAX #45 
ideograph UTC-03332 is encoded in the Extension I block as U+2ECAC 𮲬:

https://twitter.com/Kesuuko_0826/status/1768903062461030446 

Its record in the USourceData.txt data file should be changed to the 
following, and U+2ECAC 𮲬 can be horizontally extended to add UTC-03332 
as a new U-source source reference:

UTC-03332;ExtI;U+2ECAC;75.4;;⿰木太;UTCDoc L2/22-206 35;;8;1

That is all.

Feedback routed to Script Encoding Working Group for evaluation [SAH]

Date/Time: Tue Feb 06 15:45:04 CST 2024
ReportID: ID20240206154504
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Feedback on Book Pahlavi proposal

I agree with most of the proposal as it is, just a couple of observations:

1) On the codechart, the primitives do not seem to line up with their names.
This is a simple editorial mistake.

2) Changing the name of the primites to be BOOK PAHLAVI PRIMITIVE
[specific name], may be a good way to distinguish them from punctuation or
letters.

3) There should be a small discussion on why the combining marks and the
punctuation marks should be disunified from existing characters.

4) I'm personally not convinced the letters that are descending or bellied
forms or other letters are necessary to encode atomically, when they can be
handled by inputting the base letter, followed by a descending tooth/belly
and fusing the characters as expected. Perhaps there is a good reason, so
it would be nice if it was discussed.

5) A previous document mentions that some scribes distinguish between
straight and round bellies, but the new document does not mention this.
This is important since it affects the repertoire or the addition of SVSs

Date/Time: Thu Feb 22 22:37:20 CST 2024
ReportID: ID20240222223720
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: On the symbol for asteroid Flora

After looking at the attestations given in document L2/23-207 for the symbol
for Asteroid Flora, many of the figures show a glyph much like the one used
for 2698 ⚘. Specifically figures 1, 2, 3, 8, 11, 14, 26, 27, 36, 39, 42, 45
and 54, all show a glyph that is only marginally different from the already
existing character. Figures 37 and 38 are too low quality to determine what
the glyph is like. Leaving only figures 29, 31, 34, 51 having truly
different glyphs. The author may argue that the difference on my first list
is on the addition of a "planetary cross" on the stem (i.e. a horizontal
stroke).

However  one could use a glyph without the stroke in astronomical contexts
and there would be no confusion about it.  This is known, because figures
1, 2, 3, 8, 36 and 45 definitely omit the stroke, meaning only figures 11,
14, 26, 27, 39 and 54 include it. I present this as evidence that users
would not mind using the existing character, even if it meant no stroke in
many cases; Indeed, figures 1 and 2 correspond to fonts made recently, so
both old and modern users are fine with the glyph.

The examples in figures 29, 31 and 34 use a symbol of what appears to be the
head of a rose, which would be more distinguishable and a better candidate
to be the glyph used. But this glyph seems to be rare, in fact figures 29
and 31 are from the same author on different editions of the same work,
meaning only two works of the ones sampled preferred this glyph.

This only leaves figure 51, which is the same glyph we have for 2698, but
there are some petals in the circle and a dot inside (plus the stroke).
However it's not clear to me that the dictionary symbol wouldn't sometimes
assume this form anyway, making them again unifiable. In any event there is
only one example of that glyph being used.

In conclusion (based on the evidence they themselves provided), the authors
have not provided compelling evidence to indicate that the existing
character cannot/should not be used to represent asteroid Flora. It should
be removed from the pipeline until better evidence is provided. If
unificación is decided, then a note could be added to 2698 "also used to
represent Asteroid Flora"

Date/Time: Thu Feb 22 23:01:43 CST 2024
ReportID: ID20240222230143
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: On the name of one of Harrington's diacritics

In document L2/23-206R, some new combining marks are proposed, and one of
this is called COMBINING FALLING DIAGONAL DIAERESIS. I propose the name
COMBINING DIAERESIS WITH RAISED LEFT DOT, since the word "diagonal" implies
the presence of a line segment. It's simply more natural to think of it as
starting with a diaeresis and raising one of the dots, than to think of a
line segment, lining the dots of the diaeresis with it and removing the
line.

If we were talking about a mark involving more dots then a better argument
could be made for the proposed name.

Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]

Date/Time: Sun Jan 21 03:42:45 CST 2024
ReportID: ID20240121034245
Name: Jaycee Carter
Report Type: Error Report
Opt Subject: DUCET allkeys.txt

The sorting order of Bopomofo ㄒ (U+3112) and ㄬ (U+312C) is the wrong way
around. ㄒ is equivalent to pinyin "x", while ㄬ is now obsolete in Mandarin,
but was originally included in Bopomofo to represent the initial /ɲ/ in Old
National Pronunciation (老國音). It is no longer usually included in Bopomofo
tables. However, sources which do include ㄬ list it before ㄒ, as part of
the sequence of alveolo-palatal initials:

1. Page 24 of a copy of 《註音漢字》 from 1936, column 4
( https://en.wikipedia.org/w/index.php?title=File:CADAL11100176
注音漢字.djvu&page=24).

2. A table in 《校改國音字典》from 1920
(https://commons.wikimedia.org/wiki/File:《校改國音字典》注音符號.jpg).

3. Two (unfortunately unsourced) copies of what look to be period books on
Wikipedia
(https://en.wikipedia.org/wiki/File:Bopomofo.gif;
https://zh.wikipedia.org/wiki/File:Bopomofo_in_Regular,_Handwritten_Regular_%26_Cursive_formats.jpg).

As such, their primary weights should be switched in the UCA: U+312C should
have the value [.4573.0020.0002], and U+3112 should have the value
[.4574.0020.0002].

Date/Time: Mon Feb 05 04:40:33 CST 2024
ReportID: ID20240205044033
Name: Henri Sivonen
Report Type: Error Report
Opt Subject: UTS 46

UTS #46 section 4 Processing step 1. Map says of "disallowed"
characters: "Leave the code point unchanged in the string. Note: The
Convert/Validate step below checks for disallowed characters, after mapping
and normalization."

Step 2. Normalize then turns compatibility ideographs into their singleton
decompositions.

Therefore, if a label does not start with "xn--", the characters U+2F868,
U+2F874, U+2F91F, U+2F95F, and U+2F9BF will no longer occur in the domain
after Step 2. Normalize.

However, if the label starts with "xn--" and Punycode decoding yields
U+2F868, U+2F874, U+2F91F, U+2F95F, or U+2F9BF, these characters fail
section 4.1 Validity Criteria item 7., which denies characters that don't
have the status "valid" or for Nontransitional Processing "deviation".

It seems to me that these five characters are the only ones for which it
makes a difference not to do "disallowed" processing before normalization
already in 4 Processing step 1. Map.

ICU4C's UTS 46 normalization yields the REPLACEMENT CHARACTER for these five
characters, which is logically equivalent to performing "disallowed"
processing already in 4 Processing step 1. Map.

Please call out the implications for these five characters explicitly so
that it's clear to the reader whether the intent is that these characters
are only prohibited in already-Punycode-form labels but not in Unicode-form
labels (the outcome of the current spec text) or whether the intent is for
these characters to be prohibited both in already-Punycode-form labels and
in Unicode-form labels (as suggested by ICU4C's UTS 46 normalization).

Furthermore, today it appears to be the case that all "disallowed"
characters either decompose to themselves or have a singleton decomposition
in NFD. It would be useful to have a remark about whether this is a
guarantee that is expected to hold for future versions.

Date/Time: Mon Mar 18 08:41:27 CDT 2024
ReportID: ID20240318084127
Contact: hsivonen@mozilla.com
Name: Henri Sivonen
Report Type: Error Report
Opt Subject: UTS 46

https://www.unicode.org/reports/tr46/#ToASCII says "When VerifyDnsLength is
true, the empty root label is disallowed." Yet, it appears that
IdnaTestV2.txt is meant to be run with the flags set to the more
restrictive options but the test input "a.b．c。d｡" and similar inputs
following it are expected to pass without errors despite
(after normalization) ending with the empty root label dot.

It's unclear to me if this is a test suite bug or a spec bug. I observe that
the VerifyDNSLength check in the Rust `idna` crate allows the trailing dot
(agreeing with the test suite but appearing to disagree with the spec).

Date/Time: Tue Apr 02 10:47:44 CDT 2024
ReportID: ID20240402104744
Name: Henri Sivonen
Report Type: Error Report
Opt Subject: UTS 46

When implementing UTS 46, the most time-consuming wrong path was trying to
design data structures for UTS 46 data assuming that the data needs to have
distinct data entries for disallowed_STD3_valid and disallowed_STD3_mapped
before discovering that these can be handled as valid and mapped with an
ASCII deny list applied afterwards.

I suggest refactoring the spec so that:

1) disallowed_STD3_valid and disallowed_STD3_mapped become simply valid and
mapped in the data and the spec says when to apply an ASCII deny list 2)
instead of a boolean UseSTD3ASCIIRules the algorithm would take an ASCII
deny list.

UTS 46 itself could define an STD3 ASCII deny list and the WHATWG URL
Standard could use forbidden domain code point
https://url.spec.whatwg.org/#forbidden-domain-code-point as an ASCII deny
list parameter to UTS 46.

It would probably appropriate to make informative remarks that a) putting
ASCII letters, digits, or hyphen on the deny list would break things and b)
in the validation phase, the ASCII period can be put on the deny list to
handle that validity constraint as part of the ASCII deny list check.

Date/Time: Tue Apr 02 10:50:47 CDT 2024
ReportID: ID20240402105047
Name: Henri Sivonen
Report Type: Error Report
Opt Subject: UTS 46

It seems to me that in practice it should be considered an error for
Punycode decoding not to yield any non-ASCII output (both Firefox and
Safari treat this as an error). However, I don’t see any spec text to that
effect either in UTS 46 itself or in RFC 3492. I suggest adding an item
under “Processing” step 4 ‘If the label starts with “xn--”:’ between
current items 1 and 2: “If the label ends with U+002D HYPHEN-MINUS, record
that there was an error, and continue with the next label.”

This would catch both the case where the hyphen is the last hyphen of “xn--”
and Punycode decoding would have no output at all and the case where there
are no Punycode digits after the delimiter, which means not producing any
non-ASCII output. 

Notably in Firefox and Safari, https://xn--unicode-.org/ is in error and not
equivalent to https://unicode.org/ and https://unicode.org.xn--/ is in
error and not equivalent to https://unicode.org./ .

Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]

(None at this time.)

Feedback routed to Editorial Working Group for evaluation [EDC]

Date/Time: Wed Feb 07 01:18:13 CST 2024
ReportID: ID20240207011813
Name: Biswajit Mandal
Report Type: Public Review Issue
Opt Subject:

As per the new code chart of Ol Onal and Gurung Khema there are two mistakes.
In Gurung Khema Letter A U+16100 is a vowel-carrier letter and in the Ol Onal, 
sign Hoddond 1E5F0 will come under Various sign section not in Digit section.

Other Reports

(None at this time.)

L2/24-063