L2/25-005

Comments on Public Review Issues
(October 25, 2024 - January 2, 2025)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 25, 2024 - January 2, 2025, since the previous cumulative document was issued prior to UTC #182 (October 24, 2024).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of January 2, 2025

Issue Name Feedback Link
508 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)
509 Proposed Draft UTS #58, Unicode Linkification (feedback)
510 Proposed Draft UTR #59, East Asian Spacing (feedback)
511 Proposed Update UTS #10, Unicode Collation Algorithm (feedback)
512 Proposed Update Unicode Technical Report #33, Unicode Conformance Model (feedback)

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]
Feedback routed to Script Encoding Working Group for evaluation [SEW]
Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]
Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]
Feedback routed to Editorial Working Group for evaluation [EDC]
Other Reports

 


Feedback routed to CJK & Unihan Working Group for evaluation [CJK]

Date/Time: Thu Oct 31 10:11:14 CDT 2024
ReportID: ID20241031101114
Name: Andrew West
Report Type: Error Report
Opt Subject: Unihan_Variants.txt

In Unihan_Variants.txt there are these two entries:

U+47B6	kSimplifiedVariant	U+2C985
U+2C985	kTraditionalVariant	U+47B6

However, U+47B6 䞶 (⿺走易) does not simplify to U+2C985 𬦅 (⿺走𠃓), which is actually the simplified form of U+27F2E 𧼮 (⿺走昜).

Therefore remove these two entries:
U+47B6	kSimplifiedVariant	U+2C985
U+2C985	kTraditionalVariant	U+47B6

And add these two entries:
U+27F2E	kSimplifiedVariant	U+2C985
U+2C985	kTraditionalVariant	U+27F2E

In addition the kMandarin and kHanyuPinyin readings in Unihan_Readings.txt are swapped for U+47B6 and U+27F2E:

U+27F2E	kHanyuPinyin	53492.080:tì
U+27F2E	kMandarin	tì

U+47B6	kHanyuPinyin	53494.110:tāng,tàng
U+47B6	kMandarin	tāng

U+27F2E should have the readings tāng and tàng (cf. kFanqie = 吐郎, kJapanese = トウ).
U+47B6 should have the reading tì (cf. kJapanese = テキ チャク).

Date/Time: Wed Nov 20 20:54:41 CST 2024
ReportID: ID20241120205441
Name: Paul Masson
Report Type: Error Report
Opt Subject: Chinese encoding of U+48CB 䣋

The Chinese encoding of this character is currently ⿰釆阝, when it should be ⿰采阝.

Feedback routed to Script Encoding Working Group for evaluation [SEW]

Date/Time: Fri Dec 27 09:21:47 CST 2024
ReportID: ID20241227092147
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: On the naming on some musical symbols


After inspecting the glyphs of certain symbols, I want to suggest names that better describe their glyphs for some of them:

MUSICAL SYMBOL FLAT WITH STROKE -> MUSICAL SYMBOL FLAT WITH VERTICAL STROKE
MUSICAL SYMBOL FLAT WITH DOUBLE STROKE -> MUSICAL SYMBOL FLAT WITH DOUBLE VERTICAL STROKE
MUSICAL SYMBOL ARABIC THREE QUARTER TONE FLAT -> MUSICAL SYMBOL FLAT WITH DOUBLE HORIZONTAL STROKE
MUSICAL SYMBOL HALF SHARP WITH STROKE -> MUSICAL SYMBOL HALF SHARP WITH LONG HORIZONTAL STROKE
MUSICAL SYMBOL SHARP WITH STROKE -> MUSICAL SYMBOL SHARP WITH LONG HORIZONTAL STROKE

Date/Time: Fri Dec 27 11:09:09 CST 2024
ReportID: ID20241227110909
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: On the new buzz roll combining character


The character that is provisionally assigned 1D25F follows the same enconding model as 
the tremolos (beign applied on top of the stem that is itself a combining character). But 
this character is never applied in the abscence of a stem, it would make more sense to treat 
it as another kind of stem: MUSICAL SYMBOL COMBINING STEM WITH BUZZ ROLL. I understand that 
this is not the case for the family of tremolos, but the new ones have to follow the same model 
as their predecesors for stability. Frankly, if I was around I would have insisted in enconding 
them as COMBINING STEM WITH TREMOLO, but it's too late for that.
Also frankly, the current encoding model is not that problematic (mine only saves one codepoint 
by basically combining two characters), but I'm wondering if all further modified stems will be encoded like this.

Date/Time: Tue Jan 07 18:42:03 CST 2025
ReportID: ID20250107184203
Name: David Corbett
Report Type: Error Report
Opt Subject: L2/25-021


L2/25-021 “Proposal to encode the Devanagari Vowel Sign AAO in Unicode” says the proposed vowel sign cannot be 
represented by <U+093E, U+094B> because that sequence gets dotted circles in existing standard fonts. However, 
dotted circles are not inserted by the fonts: they are inserted by certain shaping engines, but it works fine in 
HarfBuzz. An alternative to L2/25-021 would be to fix the shaping engine.


Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]

Date/Time: Wed Oct 30 07:39:32 CDT 2024
ReportID: ID20241030073932
Name: Andrew West
Report Type: Error Report
Opt Subject: UTS #10 Unicode Collation Algorithm

UTS #10 Unicode Collation Algorithm defines implicit weights for Tangut ideographs 
and Tangut components (see Table 16 Computing Implicit Weights) with the following 
formulas: 

AAAA = 0xFB00
BBBB = (CP - 0x17000) | 0x8000

This worked OK when there were only a Tangut block and a Tangut Components block, 
but after the addition of the Tangut Supplement block in Unicode 13.0, the above 
formulas result in Tangut ideographs in the Tangut Supplement block sorting 
after all the Tangut components, rather than sorting immediately after the Tangut 
ideographs in the Tangut block, as would be expected by users. The situation will 
be even worse after the addition of the Tangut Components Supplement block in a 
future version of Unicode, when characters in the four Tangut blocks will be 
sorted in the following order:

Tangut (17000..187FF)
Tangut Components (18800..18AFF)
Tangut Supplement (18D00..18D7F)
Tangut Components Supplement (18D80..18DFF)

The expected default sort order of Tangut ideographs and Tangut components should be:

Tangut (17000..187FF)
Tangut Supplement (18D00..18D7F)
Tangut Components (18800..18AFF)
Tangut Components Supplement (18D80..18DFF)

This could be achieved by separately calculating the implicit weights for 
Tangut ideographs and Tangut components, as below:

Assigned code points in Block=Tangut OR Tangut_Supplement:
AAAA = 0xFB00
BBBB = (CP - 0x17000) | 0x8000

Assigned code points in Block=Tangut_Components OR Tangut_Components_Supplement
AAAA = 0xFB01
BBBB = (CP - 0x18800) | 0x8000

Assigned code points in Block=Nushu:
AAAA = 0xFB02
BBBB = (CP - 0x1B170) | 0x8000

Assigned code points in Block=Khitan_Small_Script:
AAAA = 0xFB03
BBBB = (CP - 0x18B00) | 0x8000

Date/Time: Tue Dec 10 01:25:40 CST 2024
ReportID: ID20241210012540
Name: Simon Patrick
Report Type: Error Report
Opt Subject: /Public/draft/UCD/ucd/Blocks.txt

I know that this file is a very early draft for version 17.0 (file is dated 
15 November 2024) but you might like to note that (a) I think the new Sidetic 
block should end at 1095F rather than 1095C and (b) the new Beria Erfe block 
(16EA0..16EDF) is not in its correct place in code point order: it should 
come between Medefaidrin (16E40..16E9F) and Miao (16F00..16F9F).

Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]

Date/Time: Sun Dec 22 08:33:16 CST 2024
ReportID: ID20241222083316
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: Feedback on an emoji candidate (L2/24-266r)


Regarding the proposed “Ballet Dancer” sequence: Presumably this is meant to be 
the gender-neutral counterpart to U+1F483 💃 DANCER and U+1F57A 🕺 MAN DANCING that 
has been missing for years, but I strongly feel that it is not suitable for that 
purpose.

Firstly, gender variants in Unicode Emoji do not work like that. For two emoji to 
be gender variants of each other, they actually have to represent the same concept 
with the only difference between them being the appearance of the human person (and 
maybe also the colour of the clothes to make the contrast more noticeable at small 
sizes). 💃 and 🕺 are a long-standing exception to this rule because they predate 
most other gendered emoji and the idea wasn’t yet properly developed at the time, 
but adding a third incongruent variant into the mix is only going to make this 
problem worse.

For all intents and purposes, these three emoji will not be seen as variations on 
a shared base concept like all other gender variants, but as three entirely disconnected
emoji – each representing a different style of dance – that are inexplicably available 
as only a single gender each. This will make end users wonder why disco is only for men, 
flamenco is only for women, and neither men nor women can do ballet in the world of emoji. 
And we know for a fact that differently gendered ballet emoji are desired by users because 
that is precisely what L2/18-133, the original ballet dancer proposal that was resurrected 
for this endeavour, had suggested.

I understand all too well that the UTC is reluctant to approve additional gendered emoji, 
but pretending that ballet is somehow the nonbinary counterpart to flamenco and disco 
dancing cannot be the solution. At this point it would honestly be best to fully decouple 
DANCER and MAN DANCING from each other, explicitly define them to be flamenco and disco 
dancers respectively instead of generic “people dancing”, and then give both of them new 
gender variants to complete the set. I believe the experiment has failed; these two emoji 
look too different from each other to form a pair and users are generally unreceptive to 
significant emoji glyph changes, so bringing their designs closer together is probably
not a viable option.

Either that or add a third gender-neutral dancer that is just as generic as the other two. 
Its outfit could be almost anything as long as its stance is recognisable as dancing, 
which would allow it to cover a lot of different styles instead of being confined to just 
ballet. Besides, didn’t Unicode 14.0 set the precedent that gender variants will no longer 
be handled as ZWJ sequences but rather as atomic characters? U+1FAC5 🫅 PERSON WITH CROWN 
could easily have been a combination of 🧑 and 👑, but was assigned its own code point instead.

Putting gender aside, I also question the necessity of a “Person Dancing Ballet” emoji in 
general. U+1FA70 🩰 BALLET SHOES already exists; what does a pictograph showing a person 
wearing these shoes bring to the table that isn’t covered by the shoes on their own? I 
thought proposed emoji were supposed to “break new ground” and be distinctive. The fact 
that only U+1FA70 and not the human-form ballet sequences made it onto the RGI list in 
the first place shows me that the UTC also didn’t consider them worthwhile. A similar 
fate befell human-form ZWJ sequences incorporating U+1F933 🤳 SELFIE (L2/16-333) and 
U+1FA9E 🪞 MIRROR (L2/19-099), which functionally would have been duplicates because 
they would have represented nothing original compared to just U+1F933 or U+1FA9E in 
isolation. What makes the ballet dancer different in this regard?


Feedback routed to Editorial Working Group for evaluation [EDC]

Date/Time: Thu Nov 07 15:20:49 CST 2024
ReportID: ID20241107152049
Name: Jim DeLaHunt
Report Type: Website Problem
Opt Subject: Unicode standard list of components

The section of the Unicode 16.0 page, section I. "List of Components" 
<https://www.unicode.org/versions/Unicode16.0.0/#Components>, 
has an entry "Core Specification" which links only to the PDF version 
of Unicode 16.0. It lacks a link to the HTML version of the Core 
Specification. It is ironic that the component missing from the list 
of components is the one which is authoritative.

Other Reports

(None at this time.)