Comments on Public Review Issues

L2/25-071

Comments on Public Review Issues
(January 3, 2025 - April 2, 2025)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 3, 2025 - April 2, 2025, since the previous cumulative document was issued prior to UTC #183 (April 2, 2025).

Issue Name Feedback Link

508 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)

509 Proposed Draft UTS #58, Unicode Linkification (feedback)

510 Proposed Draft UTR #59, East Asian Spacing (feedback)

511 Proposed Update UTS #10, Unicode Collation Algorithm (feedback)

512 Proposed Update Unicode Technical Report #33, Unicode Conformance Model (feedback)

513 Proposed Update UAX #50, Unicode Vertical Text Layout (feedback)

514 Unicode 17.0 Alpha Review (feedback)

515 Unicode Emoji 17.0 Alpha Repertoire (feedback)

516 Proposed Update UAX #44, Unicode Character Database (feedback)

517 Review of Identifier_Type for existing characters (feedback)

518 Proposed Update UTS #51, Unicode Emoji (feedback)

519 Proposed Update UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet) (feedback)

520 Proposed Draft UAX #60, Data for Non Han Ideographic Scripts (feedback)

Issue	Name	Feedback Link
508	Proposed Update UAX #38, Unicode Han Database (Unihan)	(feedback)
509	Proposed Draft UTS #58, Unicode Linkification	(feedback)
510	Proposed Draft UTR #59, East Asian Spacing	(feedback)
511	Proposed Update UTS #10, Unicode Collation Algorithm	(feedback)
512	Proposed Update Unicode Technical Report #33, Unicode Conformance Model	(feedback)
513	Proposed Update UAX #50, Unicode Vertical Text Layout	(feedback)
514	Unicode 17.0 Alpha Review	(feedback)
515	Unicode Emoji 17.0 Alpha Repertoire	(feedback)
516	Proposed Update UAX #44, Unicode Character Database	(feedback)
517	Review of Identifier_Type for existing characters	(feedback)
518	Proposed Update UTS #51, Unicode Emoji	(feedback)
519	Proposed Update UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)	(feedback)
520	Proposed Draft UAX #60, Data for Non Han Ideographic Scripts	(feedback)

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]
Feedback routed to Script Encoding Working Group for evaluation [SEW]
Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]
Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]
Feedback routed to Editorial Working Group for evaluation [EDC]
Other Reports

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]

(None at this time.)

Feedback routed to Script Encoding Working Group for evaluation [SEW]

Date/Time: Tue Jan 28 15:03:39 CST 2025
ReportID: ID20250128150339
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: A correction on the naming of musical symbols


After checking my submission suggesting different names for newly proposed musical symbols, I discovered I blundered and wrote HORIZONTAL when I meant VERTICAL and viceversa in the first three names.
Here I present the corrected names:

MUSICAL SYMBOL FLAT WITH STROKE -> MUSICAL SYMBOL FLAT WITH HORIZONTAL STROKE 
MUSICAL SYMBOL FLAT WITH DOUBLE STROKE -> MUSICAL SYMBOL FLAT WITH DOUBLE HORIZONTAL STROKE
MUSICAL SYMBOL ARABIC THREE QUARTER TONE FLAT -> MUSICAL SYMBOL FLAT WITH DOUBLE VERTICAL STROKE
MUSICAL SYMBOL HALF SHARP WITH STROKE -> MUSICAL SYMBOL HALF SHARP WITH LONG HORIZONTAL STROKE 
MUSICAL SYMBOL SHARP WITH STROKE -> MUSICAL SYMBOL SHARP WITH LONG HORIZONTAL STROKE

Date/Time: Sun Feb 02 16:48:16 CST 2025
ReportID: ID20250202164816
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: On the name of one asteroid symbol


One of the newly proposed symbols had an issue with its name as there was already a character with its nam (PROSERPINA) and it didn't even refer to the asteroid in question. The name they came up 
with is the mouthful ASTRONOMICAL SYMBOL FOR ASTEROID PROSERPINA. But this is unnecesary. Simply naming it ASTEROID PROSERPINA should suffice, with a note in 2BD8 that it's not meant to represent 
the asteroid and a cross reference to the correct codepoint.

Date/Time: Thu Feb 20 05:19:24 CST 2025
ReportID: ID20250220051924
Name: Charlotte Buff
Report Type: Error Report
Opt Subject: L2/24-234r; L2/25-016


I propose adjusting the names of several provisionally assigned characters to better align them with existing practices and fix minor mistakes.

U+1DF54: LATIN SMALL LETTER BARRED LATIN CHI → LATIN SMALL LETTER BARRED CHI
	The second “LATIN” is superfluous.

U+1DF43: LATIN SMALL CAPITAL BARRED E → LATIN LETTER SMALL CAPITAL BARRED E
	Missing word “LETTER”.

U+1DFCD: LATIN SUPERSCRIPT SMALL LETTER TURNED R WITH MID-HEIGHT LEFT HOOK → MODIFIER LETTER SMALL TURNED R WITH MID-HEIGHT LEFT HOOK
U+1DFCE: LATIN SUPERSCRIPT SMALL LETTER SPLIT O → MODIFIER LETTER SMALL SPLIT O
U+1DFCF: LATIN SUPERSCRIPT SMALL LETTER SPLIT U → MODIFIER LETTER SMALL SPLIT U
	Consistency with other superscript modifier letters.

Date/Time: Sat Feb 22 06:54:49 CST 2025
ReportID: ID20250222065449
Name: Charlotte Buff
Report Type: Error Report
Opt Subject: L2/24-151r


Regarding the provisionally assigned character U+1F1AE TOMOBIKI SYMBOL: Wouldn’t it make more sense for it to be located in the Enclosed 
Ideographic Supplement block rather than Enclosed Alphanumeric Supplement? Given that it is used exclusively in traditional Japanese 
calendars, putting it into a block that is already populated with plenty of characters from Japanese character sets would aid discoverability.

Date/Time: Thu Mar 13 14:09:20 CDT 2025
ReportID: ID20250313140920
Name: Eduardo Marin Silva
Report Type: Public Review Issue
Opt Subject:


This is a few pieces of feedback regarding document L2/25-038

1) I believe it would be better to encode the vowel letters as precomposed characters as appropiate (following 
the Tulu model). Enconding just the ones without decomposition would be fine, but South Asian script users are 
used to codepoints for their vowel letters and it would be weird to pivot now. Enconding them as atomic characters 
would imply entering entries for DoNotEmit which was always a contrived but necessary solution, so the Tulu model 
has less downsides.

2) Some figures like 13, 17, 23 and 38 have portions of text inside a double walled enclosure. I don't believe 
they should simply be ommited and so there should be a way to render them. I suggest two bracket like characters 
to indicate the start and end of the enclosure. Rendering engines could connect the lines under both characters 
or simply render the end pieces as fallback.

On a related note, I noticed that some text is written in red ink. It's unclear why this is done for, but most 
likely mark-up would be enough to capture it.

3) Figure 27 Shows a longer than usual danda. If this character is used in the same document as the regular sized 
danda and appears in different documents, then that would be evidence for disunification. I would call it SIRMAURI 
LONG DANDA.

4) Figure 30 depicts section marks but with an added glyph above. I'm assuming this is the Ekam sign. It's worth 
considering if one needs to encode SIRMAURI SECTION SIGN WITH EKAM ABOVE (and the double version), or if some other 
mechanism would be appropiate.

5) In section 4 of the document a weird glyph is mentioned in the section for "i" and the section for "e" as a 
form of the letter "e". It's shape is very different from the other shapes for "e" which makes me believe it 
could be encoded as a third alterante version of the letter.

Date/Time: Thu Mar 13 14:32:20 CDT 2025
ReportID: ID20250313143220
Name: Eduardo Marin Silva
Report Type: Public Review Issue
Opt Subject:


These are a few pieces of feedback regarding provisionally assigned characters as depicted in 
https://www.unicode.org/wg2/docs/n5291-Post17Codechart.pdf

1) In the Arabic Extended-C block I would change the name of the proposed characters from ARABIC CROWN LETTER 
[letter name] to ARABIC LETTER CROWN [letter name] and the combinging sign to ARABIC COMBINING CROWN.

2) In the Kana Extended-A block I noticed there are notes explaining the meaning of the digraphs. I have nothing 
against it, I just would like if there were similar notes in the Hiragana and Katakana blocks when it comes to 
their digraphs.

3) The Tomobiki symbol was placed in the Enclosed Alphanumeric Supplement block, wich I find odd since the shape 
inisde is not meant to be any sort of letter or digit, while the related charcter BLACK CIRCLE WITH WHITE VERTICAL 
BAR is in the Geometric Shapes Extended block. I belive it would be better if we were to move the TOMBOIKI SYMBOL 
into the Geometric Shapes Extended block. I would place it at 1F7DA and move the other symbol to 1F7DB. It could 
also just be placed in 1F7DB to avoid moving the other symbol, but that symbol is meant for the sixth day while 
the Tomobiki is meant for the second day, so it makes sense if the Tomobiki goes before.

Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]

Date/Time: Wed Jan 20, 2025
ReportID: ID20250129082145
Name: Owen Shepherd
Report Type: Feedback
Opt Subject: UTC Doc: Add `-` to PROP_VALUE syntax in TS18

The syntax for unicode property specifications in regular expressions, according
to TS18, looks like this:
CHARACTER_CLASS
:= '\' [pP] '{' PROP_SPEC '}'
:= '[:' COMPLEMENT? PROP_SPEC ':]'
PROP_SPEC := PROP_NAME (RELATION PROP_VALUE)?
PROP_NAME := ID_CHAR+
ID_CHAR := [A-Za-z0-9\ \-_]
RELATION := "=" | "≠" | "!="
PROP_VALUE := LITERAL*
LITERAL
:= ESCAPE (SYNTAX_CHAR | SPECIAL_CHAR)
:= NON_SYNTAX_CHAR
:= HEX
:= "\\N{" ID_CHAR+ "}"
ESCAPE := "\\"
SYNTAX_CHAR := [\- \[ \] \{ \} / \\ \^ |]
SPECIAL_CHAR := [abcefnrtu]
NON_SYNTAX_CHAR := [^SYNTAX_CHAR]
HEX
:= "\\u" HEX_CHAR{4}
:= "\\u{" SP? CODEPOINT (SP CODEPOINT)* SP? "}"
HEX_CHAR := [0-9A-Fa-f]
CODEPOINT := "10" HEX_CHAR{4} | HEX_CHAR{1,5}
SP := " "+
COMPLEMENT := "^"
Narrowing in on the specification of PROP_VALUE, we see that it’s comprised
of 'LITERAL*'.
None of the productions of 'LITERAL' admit the ‘-’ character. This means
that regex syntax such as '\p{name=ZERO WIDTH NO-BREAK SPACE}', and
'\p{name=surrogate-D800}', (both used as examples in this document), are
invalid.
I propose altering the production of 'PROP_VALUE' to:
PROP_VALUE := (LITERAL | '-')*
2

Date/Time: Tue Feb 25 04:41:03 CST 2025
ReportID: ID20250225044103
Name: Ole Begemann
Report Type: Error Report
Opt Subject: DoNotEmit.txt in the UCD


The UCD file DoNotEmit.txt at <https://www.unicode.org/Public/UCD/latest/ucd/DoNotEmit.txt> contains a spelling error in the initial comment block ("sequeences"):

# Preferred_Spelling:
#    Miscellaneous characters and sequeences for which the Unicode Standard
#    specifies a preferred spelling.

The error is still present in the current draft at <https://www.unicode.org/Public/draft/ucd/DoNotEmit.txt> (as of 2025-02-25).

Is this something that can be fixed? Thanks!

Date/Time: Thu Mar 20 08:32:35 CDT 2025
ReportID: ID20250320083235
Name: Dieter Niebel
Report Type: Website Problem
Opt Subject: CaseFolding Example


The Introduction to https://unicode.org/Public/UCD/latest/ucd/CaseFolding.txt reads:

#...
# The data supports both implementations that require simple case foldings
# (where string lengths don't change), and implementations that allow full case folding
# (where string lengths may grow). Note that where they can be supported, the
# full case foldings are superior: for example, they allow "MASSE" and "Maße" to match.
#...

A match of "MASSE" and "Maße" may not be desirable since "Masse" (engl. "mass") is distinct from "Maße" (engl. "measures"). 
To save the example a pair such as "FUSS" and "Fuß" could be used.

Regards

Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]

(None at this time.)

Feedback routed to Editorial Working Group for evaluation [EDC]

Date/Time: Wed Jan 29 08:21:45 CST 2025
ReportID: ID20250129082145
Name: Hu Xiangyou
Report Type: Error Report
Opt Subject: Core Spec


In Chapter 2 General Structure, in the Core Spec, Unicode Version 16.0, the "Figure 2-5. Writing Direction and Numbers" 
is missing Example ⑤, which should be "1123ページをみてください。" in horizontal Japanese.

Date/Time: Mon Mar 17 18:00:23 CDT 2025
ReportID: ID20250317180023
Name: Tatsunori Uchino
Report Type: Website Problem
Opt Subject: Glossary lacks the entry of "surrogate code unit"


https://www.unicode.org/glossary/
The Glossary has the entries of "high-surrogate code unit" and "low-surrogate code unit", but lacks the entry of "surrogate code unit". 
I want to make links to the description on "surrogate code unit" in my specifications website, but I cannot. I mention "isolated 
surrogate code units" there. It would be appreciated if you would add the term "isolated surrogate code unit" to Glossary because 
veteran major languages (C/C++ wchar_t in Windows & Java and other JVM-based languages & C# and other .NET-based languages 
& JavaScript/TypeScript) have adopted UTF-16 as their character expression.

Other Reports

(None at this time.)

L2/25-071