Accumulated Feedback on PRI #497

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Fri Feb 09 14:30:55 CST 2024
ReportID: ID20240209143055
Name: Denis Moyogo Jacquerye
Report Type: Public Review Issue
Opt Subject: 497 [EDC]

The glyph of ੴ U+0A74 GURMUKHI EK ONKAR was updated in Unicode 11.0.
See https://www.unicode.org/charts/PDF/Unicode-11.0/U110-0A00.pdf and error report 
"Error with rendering ੴ (U+0A74)" by Harkeerat Toor in http://www.unicode.org/L2/L2016/16123-pubrev.html.

The Unicode Standard 10.0, 11.0 and later versions still have the same text in chapter 12.3 Gurmukhi:

> OtherSymbols. The religious symbol khanda sometimes used in Gurmukhi texts is encoded
> at U+262C ADI SHAKTI in the Miscellaneous Symbols block. U+0A74 GURMUKHI
> EK ONKAR, which is also a religious symbol, can have different presentation forms, which
> do not change its meaning. The font used in the code charts shows a highly stylized form; 
> simpler forms look like the digit one, followed by a sign based on ura, along with a long
> upper tail.

The statement "The font used in the code charts shows a higly stylized form" has not been 
true since 11.0.

The last sentence could be changed to:
"The font used in the code charts shows a simpler form that looks like the digit one, followed 
by a sign based on ura, along with a long upper tail ; other forms may be highly stylized."

Date/Time: Tue Feb 13 10:18:01 CST 2024
ReportID: ID20240213101801
Name: Max Blechman
Report Type: Public Review Issue
Opt Subject: 497 [SAH]

I am writing today to express my support for certain characters that have
been proposed for addition in the Latin Extended-G block, namely those from
the Initial Teaching Alphabet. There have been multiple proposals to add
the ITA to Unicode, and since it was historically used to publish many
children’s books and is still today used for students who have issues with
traditional English spelling, its addition to Unicode would be extremely
useful for its users, of which there are still many. Thank you for your
time and consideration.

Date/Time: Tue Feb 13 19:17:11 CST 2024
ReportID: ID20240213191711
Name: Eiso Chan
Report Type: Public Review Issue
Opt Subject: 497 [EDC]

In this year, the Chinese media use the term “the year of loong” (龙年/龍年) not 
“the year of dragon”. See https://english.news.cn/20240210/ce190d57cd8a405db28e034ade839063/c.html
https://news.cgtn.com/news/2024-01-22/Where-did-China-s-mythic-loong-come-from--1qzMho0EXxm/p.html

The term “loong” is more and more common for the Chinese word 龙/龍, which is different 
from the original meaning of “dragon” in English. It is better to add the annotations 
both for U+1F409 🐉 and U+1F432 🐲 as below.

* also used for loong in Chinese

Date/Time: Tue Feb 13 20:19:17 CST 2024
ReportID: ID20240213201917
Name: Bryndan Meyerholt
Report Type: Public Review Issue
Opt Subject: 497 [EDC]

The character OL ONAL SIGN HODDOND should probably go under the Various signs section 
instead of the digits section as it appears to be used as a sign/diacritic mark instead 
of a digit in Ol Onal. Also check the Wikipedia article of Ol Onal, and scroll down 
until you see an image with the caption Ol Onal Script.

Date/Time: Wed Feb 14 15:25:32 CST 2024
ReportID: ID20240214152532
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: 497 [PAG]

The UCD in Unicode 16.0 alpha defines Indic syllabic and positional
categories for the Kirat Rai script. The final proposal for Kirat Rai,
L2/22-043R, does not provide such data.

I don’t think the omission of Indic data in the proposal was an oversight.
The proposal states: “The script does not have the rendering complexity of
traditional Brahmic scripts (no reordering, no combining marks, and no
conjuncts).” This means, a simple visual encoding model, where spacing
characters are encoded in left-to-right order, is sufficient for the script
and is intended. Indic data, which implies a phonetic encoding order,
should not be added.

The phonetic encoding model used for most Brahmic scripts and the visual
encoding model used for most non-Brahmic scripts are fundamentally
incompatible and should never be combined. Even for a very simple script
like Kirat Rai, there’s a slight potential for conflicts between the visual
and phonetic encoding order based on the Brahmic cluster model of the
OpenType Universal Shaping Engine: Any sequence of characters with gc=Lm
(of which Kirat Rai has five) would become part of a single cluster and
would have to be encoded in primarily phonetic order.

Any data for Kirat Rai should be removed from IndicSyllabicCategory.txt and
IndicPositionalCategory.txt.

Date/Time: Wed Feb 14 17:11:11 CST 2024
ReportID: ID20240214171111
Name: Karl Pentzlin
Report Type: Error Report
Opt Subject: UnicodeStandard-15.0.pdf [EDC]

Table 22-4 "Compatibily digits" (p. 862) Line "Circled digits", column "Code Range(s)" 
should be "24EA, 2460..2468" instead of "24EA, 2080..2089"

Date/Time: Fri Feb 16 13:51:49 CST 2024
ReportID: ID20240216135149
Name: Charlotte Buff
Report Type: Other Document Submission
Opt Subject: Bidi class of Nabla variants [PAG]

I propose changing the Bidi_Class value of the following characters to Other_Neutral from 
their current value Left_to_Right:

	U+1D6C1 MATHEMATICAL BOLD NABLA
	U+1D6FB MATHEMATICAL ITALIC NABLA
	U+1D735	MATHEMATICAL BOLD ITALIC NABLA
	U+1D76F MATHEMATICAL SANS-SERIF BOLD NABLA
	U+1D7A9 MATHEMATICAL SANS-SERIF BOLD ITALIC NABLA

U+2207 NABLA has Bidi_Class=Other_Neutral, so its font variants should share the same property 
value. This is how it already works for U+2202 PARTIAL DIFFERENTIAL and its respective font 
variants, all of which are Other_Neutral.

Date/Time: Fri Feb 16 14:01:04 CST 2024
ReportID: ID20240216140104
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 497: Ideographic property value of U+18CFF [PAG]

U+18CFF KHITAN SMALL SCRIPT CHARACTER-18CFF currently has the property Ideographic=No, but 
the value should be Yes like with all other Khitan Small Script characters.

Date/Time: Sat Feb 17 10:54:36 CST 2024
ReportID: ID20240217105436
Name: Judith Chen
Report Type: Public Review Issue
Opt Subject: 497 [CJK]

Considering that the KP-source glyph KP1-3413 of U+4E17 丗, which was added in 
Unicode 15.1, is identical to the G-source glyph GKX-0077.13 of U+2000D 𠀍 rather 
than other representative glyphs of U+4E17 丗, it might be a good idea if Unicode 
moved KP1-3413 from U+4E17 丗 to U+2000D 𠀍.

Date/Time: Sun Feb 18 00:40:09 CST 2024
ReportID: ID20240218004009
Name: Judith Chen
Report Type: Public Review Issue
Opt Subject: 497 [EDC, RMG]

Note: This issue has been fixed in draft as of 2024-02-27.

As the page *Proposed New Characters: The Pipeline* shows, 8 Standardized
Variation Sequences of 4 characters in the block *General Punctuation* have
been accepted for Unicode and appeared in the Unicode 16.0 Alpha Code
Charts. However, this was not reflected in the Unicode 16.0 Delta Code
Charts.

As a comparison, there were several SVSes in the block *CJK Symbols and
Punctuation* and *Halfwidth and Fullwidth Forms* introduced in Unicode
12.0, and the codepoints affected were all listed under the part *Glyph and
Variation Sequence Changes* in the Unicode 12.0 Delta Code Charts.

Therefore, I recommend that Unicode explicitly list all the codepoints
related to newly added SVSes in the Unicode 16.0 Delta Code Charts.

Date/Time: Sun Feb 18 07:23:51 CST 2024
ReportID: ID20240218072351
Name: Judith Chen
Report Type: Public Review Issue
Opt Subject: 497 [CJK]

According to IRG N2276 by Jaemin Chung, there used to be some "pseudo-G8
characters" in URO. The problem was partially solved in Unicode 13.0 by
changing their kIRG_GSource values into self-referring GU sources. However,
there are still two problems related to these "pseudo-G8 characters":

1. Characters like U+8980 覀, U+7CA6 粦, U+4E85 亅, U+5570 啰 should not be
changed to GU sources. Unlike other "pseudo-G8 characters", these
characters do exist in GB 8565.2-88 and are, hence, expected to have a G8
source. The reason why N2276 also mentioned these characters is that their
kIRG_GSource and kGB8 values are wrong.

As Tianheng Shen wrote in IRG N2542, "It seems that these characters do not
have a normal G-source, as if they are not used in the mainland of China."
However, the fact is that U+5570 啰 is listed as a level 1 character in
China's 通用规范汉字表 (Table of General Standard Chinese Characters), which means
that it is a common character in China.

Therefore, I suggest that the kIRG_GSource values of these four characters
be changed as follows, based on N2276:

- U+8980 覀: GU-08980 -> G8-2F7A
- U+7CA6 粦: GU-07CA6 -> G8-2F7B
- U+4E85 亅: GU-04E85 -> G8-2F7C
- U+5570 啰: GU-05570 -> G8-2F7D

2. N2276 also requested to modify the kGB8 values of some characters.
Unfortunately, this was not realised in Unicode 13.0.

I recommend removing the kGB8 values in the range 0883-0894, 1201-1294, and
1351-1394 from the Unihan database and correcting the kGB8 values of the
following five characters, based on N2276:

- U+9B25 鬥: 0893 -> 1589
- U+8980 覀: 1589 -> 1590
- U+7CA6 粦: 1590 -> 1591
- U+4E85 亅: 1591 -> 1592
- U+5570 啰: 1592 -> 1593

Date/Time: Wed Feb 21 20:00:45 CST 2024
ReportID: ID20240221200045
Name: fantasai
Report Type: Error Report
Opt Subject: Unicode 15.1 U+1F??? [EDC, ESC]

When reviewing some tests, I was told that Unicode and the ESC intends that
the family sequences constructed from gendered people symbols should be
deprecated and rendered equivalently to the new gender-neutral
sequences, _with the intent that users no longer perceive any differences
among these encoding sequences_.

If that is the expectation, then the UTC should

a) document this intent and their equivalence in Chapter 22 (Symbols), not just 
in dated memos from ESC to UTC

b) capture this canonicalization in mapping tables as appropriate

If some implementations treat the gendered forms as distinct and others don't, 
this can create interop problems. And if users are intended to not perceive any 
differences among these sequences, then they shouldn't encounter any during search, 
collation, etc. either.
~fantasai


Date/Time: Fri Feb 23 14:41:37 CST 2024
ReportID: ID20240223144137
Name: Diggory Hardy
Report Type: Error Report
Opt Subject: TR9 [PAG]

In TR9, version 15.1.0, section 3.3.3 -
http://www.unicode.org/reports/tr9/#Preparations_for_Implicit_Processing 

It is implied that rules X1-X8 assign embedding levels to characters based
only on the paragraph level and explicit formatting tokens, but that these
levels will soon be adjusted based on characters' "implicit bidirectional
types".

X9 does not mention adjusting characters' levels.

X10, point 1 does not either. It does however imply that level runs should
already have been calculated, and thus that character embedding levels
should already have been adjusted.

Furthermore, I do not see any explanation of the calculation of embedding
levels, only examples. Is it possible that this part of the specification
got lost in a re-organisation?

By the way, I do not find the mixture of prose, algorithms and examples used
in this article the easiest to follow, but do not have strong suggestions
(only that specifications usually do not bother discussing optimisations
which may be applied to implementations).

Date/Time: Wed Feb 28 05:27:30 CST 2024
ReportID: ID20240228052730
Name: Aditya Bayu Perdana
Report Type: Error Report
Opt Subject: Unicode Standard version 15.0.0 chapter 17 [EDC,SAH]

Referring to UTN #51, the Balinese script section of Unicode Standard
version 15.0.0 chapter 17 https://www.unicode.org/versions/Unicode15.0.0/ch17.pdf
needs to be updated in some aspects:

[editorial change]

page 716-717. The so-called Sasak characters are relatively recent creations
that have not gained common currency. This should be explicitly mentioned.

page 719-720. The section of musical symbols should refer to UTN#51 for more information

[technical change]

page 717, table 17.3. There’s no reference outside the Unicode Standard and
proposal L2/05-008 for the conjunct forms of the Sasak characters, so it’s
totally unclear where table 17.3 comes from and whether these conjunct
forms were ever used anywhere. The proposal itself says “[The Sasak
characters] conjunct forms remain to be verified”. As far as we know, they
have not been verified in the 19 years since then. The table should be
removed.

Date/Time: Thu Feb 29 10:20:19 CST 2024
ReportID: ID20240229102019
Name: Elliott Hughes
Report Type: Error Report
Opt Subject: Unicode15.0.0/ch18.pdf [EDC]

Table 18-3's Korean column says "ci" rather than the usual "ji" for earth, and 
"swu" rather than the usual "su" for water. seems weird to use Yale romanization 
here but then the modern revised romanization in the algorithm to convert precomposed 
characters to their names?

Date/Time: Sun Mar 03 22:12:23 CST 2024
ReportID: ID20240303221223
Contact: eisoch@126.com
Name: Eiso Chan
Report Type: Public Review Issue
Opt Subject: 497 [CJK]

The J-Source glyphs for U+2011E 𠄞 (JMJ-030462), U+2011F 𠄟 (JMJ-030463) and U+20120 𠄠 (JMJ-030464) 
are designed for Sung/Ming style. The Moji Joho Kiban database pages show they are really Han 
characters/Kanji, so the J glyphs should be updated.

Date/Time: Sun Mar 10 04:57:25 CDT 2024
ReportID: ID20240310045725
Name: Michel Mariani
Report Type: Public Review Issue
Opt Subject: 497 [CJK, EDC]

In Version 16.0 ALPHA REVIEW of the Code Charts:
https://www.unicode.org/Public/draft/UCD/charts/CodeCharts.pdf
 
The V-Source glyph (V1-6C40) for U+99D5 駕 appears to be defective
(incomplete horse component 馬); it is probably based on the glyph defined
in the Vietnamese font Nom Na Tong v4.6, which has been corrected since
v4.8 (currently v5.09).

Date/Time: Sun Mar 10 10:42:50 CDT 2024
ReportID: ID20240310104250
Name: Judith Chen
Report Type: Public Review Issue
Opt Subject: 497 [EDC, SAH]

The glyphs of U+1E899 MENDE KIKAKUI SYLLABLE M172 MBOO 𞢙 and U+1E89A MENDE
KIKAKUI SYLLABLE M174 MBO 𞢚 seem to be erroneous.

The block Mende Kikakui was encoded based on the proposal WG2 N4167
(L2/12-023) replacing N4133R (L2/11-301R), N3863 (L2/10-252) and N3757
(L2/10-006). In N3757 and N3863, U+1E899's current glyph 𞢙 was named MENDE
SYLLABLE MBO-2, while U+1E89A's current glyph 𞢚 had the name MENDE SYLLABLE
MBOO-2 — both were consistent with the evidence provided. However, the
glyphs of U+1E899 and U+1E89A have been incorrect since N4133, which could
be a mistake caused by a change in naming principles (N4133 renamed these
characters).

Therefore, I recommend Unicode swapping the glyphs of U+1E899 and U+1E89A to
conform with the original evidence.

That is all.

(Thanks to my friend 黑之圣雷 for pointing this issue out to me)

Date/Time: Wed Mar 27 09:17:53 CDT 2024
ReportID: ID20240327091753
Name: Charles Lawrence Riley
Report Type: Public Review Issue
Opt Subject: 497 [EDC]

I have reviewed the information on Garay as presented in PRI #497, and it 
looks clear and accurate to me.  Thank you for all the work that you have done on this.

Date/Time: Wed Mar 27 15:58:04 CDT 2024
ReportID: ID20240327155804
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 497 [CJK]

KP1-5653 is mapped to U+720B 爋 (⿰火勳), but its glyph is actually the same 
as U+24455 𤑕 (⿰火⿱⿰㇯熏灬力灬). Therefore KP1-5653 should be moved to U+24455.

Date/Time: Thu Mar 28 01:45:49 CDT 2024
ReportID: ID20240328014549
Name: Judith Chen
Report Type: Public Review Issue
Opt Subject: 497 [CJK]

The KP-source glyph KP1-83F7 of U+96DF 雟, which was added in Unicode 15.1, exactly matches 
all the representative glyphs of U+5DC2 巂, rather than other representative glyphs of U+96DF 雟. 
Therefore, I suggest moving KP1-83F7 from U+96DF 雟 to U+5DC2 巂.

Date/Time: Tue Apr 02 04:43:38 CDT 2024
ReportID: ID20240402044338
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: 497 [EDC]

Within the "Egyptian Hieroglyphs" (13000–1342F) and "Egyptian Hieroglyphs
Extended-A" (13460-143FF) blocks, the colon sign is consistently preceded
by one or sometimes two spaces in comments (starting with an asterisk). In
English, there should be no space before a colon. Here are a few EXAMPLES,
out of a total of 4,212 occurrences:

* classifier sitting : ḥmsꞽ
* logogram (to hide) : ꞽmn
* phonemogram : ḫnms

Two spaces before the colon (all instances):

* classifier rage, fury  : ḳnd
* phonemogram  : ꜥꜣb
* phonemogram  : ḫsf
* phonemogram  : wsr
* phonemogram  : ꜥḥꜥ
* phonemogram  : psḏ
* phonemogram  : rs-wḏꜣ
* phono-repeater  : sḫt
* phonemogram  : mnḫ
* phonemogram  : tꜣ
* classifier astronomical instrument  : mrḫ.t
* phonemogram  : ḫnm

Date/Time: Tue Apr 02 06:06:00 CDT 2024
ReportID: ID20240402060600
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: 497 [EDC]

Below are my findings regarding the presence of surplus spaces within the
Unikemet.txt file; some of these have implications for the NamesList.txt
file.


1/ The value (third field) of the following line begins with a space and contains two consecutive spaces:

U+13CA1	kEH_FVal	 p  & nst (i.e., U+13CA1[tab]kEH_FVal[tab][space]p[space][space]&[space]nst)

Consequently, in the NamesList.txt file:

13CA1	EGYPTIAN HIEROGLYPH-13CA1
	* phonogram :  p  & nst


2/ The values in the following lines each contain two consecutive spaces:

U+13055	kEH_Func	Logogram weaver or  nurse
U+13489	kEH_Func	Classifier  to totter
U+138D0	kEH_Func	Logogram/phonemogram  (whom truth/Maat loves)
U+13B91	kEH_Func	Logogram (to distinguish) and  (beginning, front)
U+13D04	kEH_Func	Classifier divinity  (Nekhbet)

These double spaces are reflected in the NamesList.txt file:

13055	EGYPTIAN HIEROGLYPH B005A
	* logogram weaver or  nurse : ? | mnḫ.t

13489	EGYPTIAN HIEROGLYPH-13489
	* classifier  to totter : mss

138D0	EGYPTIAN HIEROGLYPH-138D0
	* logogram/phonemogram  (whom truth/Maat loves) : mr(.y)-mꜣꜥ.t

13B91	EGYPTIAN HIEROGLYPH-13B91
	* logogram (to distinguish) and  (beginning, front) : ṯnꞽ ḥꜣ.t

13D04	EGYPTIAN HIEROGLYPH-13D04
	* classifier divinity  (Nekhbet) : nḫb.t


3/ In several dozen lines, the values in the third field contain one or two
consecutive spaces, yet with no impact on the NamesList.txt file — here are
a few EXAMPLES:

U+13047	kEH_Desc	Foreign man, with a bushy beard, standing, wearing a long dress, with  the arms hanging at either side of the body.

U+133F8	kEH_Desc	A  geometrical circle

U+136CA	kEH_Desc	The king, seated on heel, both knees down, with a long straight beard, uraeus and coif/long wig, back bend forward, arm forward, hand  at the hight of the  waist, holding a cup or vessel (W10).


4/ 199 lines conclude with one or more consecutive trailing space
characters. Enumerating all of them is impractical; however, here are some
EXAMPLES:

U+1300F	kEH_Func	Classifier rebel/enemy 
U+1316D	kEH_FVal	sꜣ 
U+131CE	kEH_Func	Phonemogram 
U+13229	kEH_FVal	ꜥnḏ.ty 
U+1331F	kEH_Desc	A harpoon-head with two horizontal strokes on top and an angled stroke below a curl as point. 

Consequently, these surplus spaces appear in the NamesList.txt file:

Line 38075: 	* classifier rebel/enemy 
Line 38813: 	* logogram (son) : sꜣ 
Line 39247: 	* logogram (9th nome of UE) : ꜥnḏ.ty 
Line 40532: 	* classifier human being (poor man) : šwꜣ.w 
Line 40536: 	* logogram (vocative interjection) : ꞽ 
Line 40600: 	* logogram (to fraternize) : snsn 
Line 40664: 	* logogram (bowing down) : ḫꜣb/ksw 
Line 41163: 	* classifier rebel/enemy 
Line 41175: 	* logogram (chiefs) : wr 
Line 41179: 	* classifier enemy/rebel (Xerxes) : ḫšryš 
Line 41237: 	* logogram (foreigner) : ḫꜣs.ty 
Line 41304: 	* logogram (Harsomtus) : ḥr-smꜣ-tꜣ.wy 
Line 41306: 	* logogram (Harsomtus) : ḥr-smꜣ-tꜣ.wy 
Line 41308: 	* logogram (to sing) : ḥsꞽ 
Line 42011: 	* logogram (Maat and Amon) : mꜣꜥ.t & ꞽmn 
Line 42044: 	* logogram (to drive away) : sḥrꞽ 
Line 42128: 	* logogram (Re) : rꜥ 
Line 42149: 	* logogram (eye of Horus) : ꞽr.t-ḥr 
Line 42382: 	* logogram (the Nile/the flood) : hꜥpy 
Line 42472: 	* phonemogram (first person for sbk-šmꜥ-nfr) : ꞽ 
Line 42572: 	* logogram/phonemogram (lady) : nb.t 
Line 42587: 	* logogram (rejoicing) : nhm 
Line 42746: 	* logogram (together with \C98 and \C43A, representing the triad of Dendera) : ꞽwn.t 
Line 42802: 	* logogram (hour) : wnw.t 
Line 43639: 	* logogram (given life like Re) : dꞽ-ꜥnḫ-mꞽ-rꜥ 
Line 43693: 	* logogram/Phonemogram (great one (female)) : wr.t 
Line 43708: 	* logogram (Hermopolis magna, 15th nome of UE) : wnw.t 
Line 46015: 	* logogram temple) : gs.w-pr.w 


5/ In a large number of lines (42 instances, excluding the one already
mentioned in point 1), the value (third field) starts with a space
character, which does not affect the NamesList.txt file — here is one
EXAMPLE:

U+131FA	kEH_Desc	 A crescent moon with a part of the moon disc.

Please note that after the tabulation character, there is a space included
as part of the line's value (third field):

U+131FA	kEH_Desc[tab][space]A crescent moon with a part of the moon disc.

Date/Time: Wed Apr 03 09:02:27 CDT 2024
ReportID: ID20240403090227
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 497 [CJK]


Another incorrect KP mapping: KP1-4D4C (⿰木⿰糸䏍) currently maps to U+3BDE (⿰木絹), but should map to U+23693 (⿰木⿰糹䏍).