L2/24-008

Comments on Public Review Issues
(October 11, 2023 - January 8, 2024)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 08, 2024, since the previous cumulative document was issued prior to UTC #177 (October 2023).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of  January 02, 2024..

Issue Name Feedback Link
495 Proposed Update UTS #55, Unicode Source Code Handling (feedback) No feedback at this time
494 Proposed Update UAX #29, Unicode Text Segmentation (feedback)
493 Proposed Draft UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet) (feedback) No feedback at this time
492 Proposed Update UTS #39, Unicode Security Mechanisms (feedback) No feedback at this time
491 Proposed Update UAX #31 Unicode Identifier and Pattern Syntax (feedback) No feedback at this time
490 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time
489 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time
488 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback at this time
487 Proposed Update UAX #53, Unicode Arabic Mark Rendering (feedback) No feedback at this time
486 Stabilization of UAX #42, Unicode Character Database in XML (UCDXML) (feedback)
485 Draft UTR #56, Unicode Cuneiform Sign Lists (feedback) No feedback at this time
484 Proposed Update UAX #50, Unicode Vertical Text Layout (feedback) No feedback at this time
483 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports

 


Feedback routed to CJK & Unihan Group for evaluation [CJK]

Date/Time: Mon Nov 06 09:11:54 CST 2023
ReportID: ID20231106091154
Name: Junliang Huang
Report Type: Public Review Issue
Opt Subject: 468

Sorry for commenting on a closed PRI.

UTC-03258 ⿱⿰户己金 is unifiable to U+289D2 𨧒 by UCV #121. 𨧒 is a GHZ character
used for person name 朱在𨧒, which is the same person in 南明史. I suggest
changing its status from FutureWS to Variant.

UTC-03259 ⿱山⿰氵乃 is potentially unifiable to U+23CB8 𣲸.

UTC-03262 ⿳士艹工 is likely an error of 㙓. The 南明史 evidence mentions that
朱朝⿳士艹工 is 崇禎十三年進士. However, 崇禎十三年庚辰科進士三代履歷 22a and 順治河南通志17:87 gives 㙓
https://iiif.lib.harvard.edu/manifests/view/drs:48757770$714i. Given that
there are only two stroke differences between ⿳士艹工 and 㙓 and ⿳士艹工 is not
attested in historical evidence. I suggest changing its status from
FutureWS to Variant.

UTC-03264 ⿱⿰火攵日 is an error of U+24274 𤉴.

The original 南明史 evidence mentions that 朱睿⿱⿰火攵日, 字翰之, is a painter. His
artwork 疏林遠岫 is available on
https://digitalarchive.npm.gov.tw/Painting/Content?pid=5186&Dept=P,
which gives his name ⿱炇𣅀. ⿱炇𣅀 is the preferred form of the current encoded
𤉴(⿱⿰火夂𣄼) shape as ⿱炇𣅀 is also attested in 四聲篇海(成化刊本)12:18a and his friend
周亮工's work 書畫錄 1:11b. The current encoded 𤉴 is a G4K character used in
文淵閣四庫全書·御定佩文齋書畫譜, also for his name. Anyway, given that there are only one
stroke and 攵/夂 differences between ⿱⿰火攵日 and 𤉴. I suggest changing its
status from FutureWS to Variant.

UTC-03265 ⿰氵⿱宀隹 (⿰氵寉) is unifiable to U+3D36 㴶 by UCV #304. I suggest
changing its status from FutureWS to Variant.

UTC-03267 ⿲金⿱火犬頁 is a variant of U+28C25 𨰥. ⿲金⿱火犬頁 is featured in the table
of contents, however, if we turn to page 1464, we will see 南明史 p. 1464
gives 𨰥 instead. So ⿲金⿱火犬頁 is a one-off misprint in TOC. I suggest changing
its status from FutureWS to Variant.

UTC-03276 ⿱⿰氵[H6-03]水 is a variant of U+3D57 㵗. The 南明史 evidence mentions
朱常⿱⿰氵[H6-03]水 is 始興縣令. 乾隆始興縣志8:4 and 道光廣東通志29:23 both give U+3D57 㵗.

UTC-03286 ⿱禾氺 is also registered in U+25578 𥝸 + VS18. I suggest changing its
status from FutureWS to Variant.

That's all.

Date/Time: Thu Nov 09 19:14:06 CST 2023
ReportID: ID20231109191406
Name: Eiso Chan
Report Type: Public Review Issue
Opt Subject: kCantonese for U+3150D

U+3150D 𱔍 is identified as UTC-00420. UAX #45 shows the following information.

UTC-00420;NoAction;;30.11;;⿰口兜;kCowles 4114*kMeyerWempe 3029*kCheungBauerIndex 375.08;;;

I checked kMeyerWempe 3029 and kCheungBauerIndex 375.08. All of them show the 
Cantonese reading should be dau3, and it means “cave, den, nest”. We could add the 
kCantonese property value as below.

U+3150D	kCantonese	dau3

Current only one source reference is UK-10751, so the UAX #45 could be updated as 
below if possible.

UTC-00420;ExtH;UTC-00420;30.11;;⿰口兜;kCowles 4114*kMeyerWempe 3029*kCheungBauerIndex 375.08;;;

BTW, the submitted evidence from UK shows the Chinese Hakka-dialect usage.

Date/Time: Fri Nov 10 20:10:43 CST 2023
ReportID: ID20231110201043
Name: Ken Lunde
Report Type: Error Report
Opt Subject: UAX #45 USourceData.txt

The following two UAX #45 ideographs are encoded in Extension H:

UTC-00635;Rejected;;64.3;;⿰扌小;kCheungBauerIndex 406.01;;;3
UTC-00911;NoAction;;32.3;;⿰土干;Adobe-CNS1 C+17331;;6;

The first record should be changed to the following, which also includes a total 
stroke count (6):

UTC-00635;ExtH;U+317EA;64.3;;⿰扌小;kCheungBauerIndex 406.01;;6;3

The second one is a duplicate of UTC-02997, which is encoded at U+31587, so its record 
should be changed to the following, which also includes a first residual stroke (1):

UTC-00911;UTC-02997;;32.3;;⿰土干;Adobe-CNS1 C+17331;;6;1

Date/Time: Sun Nov 12 19:40:53 CST 2023
ReportID: ID20231112194053
Name: Paul Masson
Report Type: Error Report
Opt Subject: kSemanticVariant for U+9452 鑒

This character already has one semantic variant listed in the database. Another 
appears to be U+9373 鍳 with one component missing, but the identical pronunciation 
in both Cantonese and Mandarin. Please update the database accordingly.

Date/Time: Sat Nov 18 10:15:13 CST 2023
ReportID: ID20231118101513
Name: Andrew West
Report Type: Error Report
Opt Subject: CJK Unified Ideographs Extension G code chart

It has recently been reported to me that the glyphs for U+30D91
(UK-02133) and U+30D94 (UK-02134) are not ideal as the 3rd and 4th strokes
of the 谷 component on the left side should be joined (as is shown for the
G-source characters U+30D93 and U+30D95). I have already supplied Michel
with an updated font for these two characters, and request that the glyphs
are updated for the Unicode 16.0 code chart.

Date/Time: Fri Nov 24 22:26:09 CST 2023
ReportID: ID20231124222609
Name: K T Shek
Report Type: Error Report
Opt Subject: Unihan_Variants.txt

Unihan_Variants.txt lists U+26552 (𦕒) as a variant of U+8998 (覘) and U+4993 (䦓):
---
U+4993	kSemanticVariant	U+8998<kMeyerWempe U+26552<kMeyerWempe
U+8998	kSemanticVariant	U+4993<kMeyerWempe U+26552<kMeyerWempe
U+26552	kSemanticVariant	U+4993<kMeyerWempe U+8998<kMeyerWempe
---

I believe this is not correct. The correct variant of U+8998 (覘) should be
U+4021 (䀡) instead of U+26552 (𦕒). The error is probably caused by the very
similar printing of the 目 radical and 耳 radical in MeyerWempe (e.g. in
P.384-385 the radical 目 in 瞄 and 渺 are very close to a 耳), resulting in the
publisher even misinterpreted 䀡 as 𦕒 (P.40, RSIndex P.91).

𦕒 shares no similar meaning to 䦓/覘/䀡.  And according to Taiwan’s variant
dictionary, 䦓, 覘, 䀡 are variants:
https://dict.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QjA0NTg5 

But 𦕒 has no variant.
https://dict.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QzEwNzUz 

I understand that it may not be possible to “fix” the reference in
kMeyerWempe to U+4021 because it clearly states that the radical is 耳. But
as the information provided by the book is in doubt, I still suggest to
consider having the kSemanticVariant of U+26552 removed to avoid confusion
or spreading of incorrect information. Thank you!

Date/Time: Mon Dec 18 13:45:40 CST 2023
ReportID: ID20231218134540
Name: Lee Collins
Report Type: Error Report
Opt Subject: Unihan_Readings.txt

Two entries have wrong transliteration from kana くりや

U+5E96	kJapaneseKun	KURYA
U+5EDA	kJapaneseKun	KURYA

KURYA should be KURIYA

Date/Time: Sun Jan 07 18:37:44 CST 2024
ReportID: ID20240107183744
Name: Paul Masson
Report Type: Error Report
Opt Subject: kMandarin for U+9244 鉄

This character is a semantic variant of U+9435 鐵. As such it should share 
the same pronunciation for this use, which is tiě, in addition to the existing 
pronunciation.

Feedback routed to Script ad hoc for evaluation [SAH]

(None at this time.)


Feedback routed to Properties & Algorithms Group for evaluation [PAG]

Date/Time: Tue Nov 07 14:09:48 CST 2023
ReportID: ID20231107140948
Name: Joe Hildebrand
Report Type: Error Report
Opt Subject: UAX #14

Summary: LB9 is unclear that the CM|ZWJ character is treated as if it does
not exist for the purpose of matching subsequent rules

LB9 currently states:

``` 
LB9 Do not break a combining character sequence; treat it as if it has
the line breaking class of the base character in all of the following
rules. Treat ZWJ as if it were CM.

  Treat X (CM | ZWJ)* as if it were X.

where X is any line break class except BK, CR, LF, NL, SP, or ZW.

At any possible break opportunity between CM and a following character, CM
behaves as if it had the type of its base character. Note that despite the
summary title, this rule is not limited to standard combining character
sequences. For the purposes of line breaking, sequences containing most of
the control codes or layout control characters are treated like combining
sequences.

```

When combined with the new rule LB28a:

```
LB28a Do not break inside the orthographic syllables of Brahmic scripts.

  AP × (AK | ◌ | AS)

  (AK | ◌ | AS) × (VF | VI)

  (AK | ◌ | AS) VI × (AK | ◌)

  (AK | ◌ | AS) × (AK | ◌ | AS) VF
```

and the following test from line 10287 of https://www.unicode.org/Public/15.1.0/ucd/auxiliary/LineBreakTest.txt:

``` 
× 1B18 ÷ 1B27 × 1B44 × 200C × 1B2B × 1B38 ÷ 1B31 × 1B44 × 1B1D × 1B36
÷ # × [0.3] BALINESE LETTER CA (AK) ÷ [999.0] BALINESE LETTER PA (AK) ×
[28.12] BALINESE ADEG ADEG (VI) × [9.0] ZERO WIDTH NON-JOINER (CM1_CM) ×
[28.13] BALINESE LETTER MA (AK) × [9.0] BALINESE VOWEL SIGN SUKU(CM1_CM) ÷
[999.0] BALINESE LETTER SA SAPA (AK) × [28.12] BALINESE ADEG ADEG (VI) ×
[28.13] BALINESE LETTER TA LATIK (AK) × [9.0] BALINESE VOWEL SIGN ULU
(CM1_CM) ÷ [0.3]

```

it becomes clear that the 200C in the input (linebreak class CM, affected by
LB9), should not just be treated as if it had the linebreak class VI, but
should not be included at ALL when trying to match LB28a.

When the 200C is treated as VI, the sequence would read: AK VI VI AK, and
would NOT match the third line of LB28.

When the 200C is ignored entirely, the sequence would read: AK VI AK, and
WOULD match the third line of LB28, as the test states.

Both of these are potentially-valid readings of the current text in LB9.
Before the addition of LB28a, there were no cases I can think of where the
difference mattered.

In a future version of the spec, the language in LB9 could be clarified to
make interoperable implementation easier.

Date/Time: Sat Nov 18 10:43:42 CST 2023
ReportID: ID20231118104342
Name: Mikhail Morozov
Report Type: Error Report
Opt Subject: The Unicode Standard, Version 15.1, Bamum Supplement Range: 16800–16A3F

There is a misspelling in the name of the character 𖠋 (U+1680B) BAMUM LETTER
PHASE-A MAEMBGBIEE in https://www.unicode.org/charts/PDF/U16800.pdf.

The proposal for encoding Old Bamum script
(https://www.loc.gov/rr/amed/pdf/proposal-for-encoding-bamum-script.pdf#page=20)
has IPA, English and French transcriptions for the letters, and it seems
that the English transcription should be spelled with one B instead of two,
MAEMGBIEE. The source for the proposal, L'Écriture des Bamum: sa
naissance, son évolution, sa valeur phonétique, son utilisation, by I.
Dugast and M.D.W. Jeffreys
(https://www.calameo.com/read/000061616e47e713325db) also supports this
opinion.

Feedback routed to Emoji SC for evaluation [ESC]

(None at this time.)


Feedback routed to Editorial Committee for evaluation [EDC]

Date/Time: Mon Nov 13 07:42:24 CST 2023
ReportID: ID20231113074224
Name: Jakub Jelinek
Report Type: Error Report
Opt Subject: Unicode15.1.0/ch04.pdf

Note: This has already been taken care of by Editorial Committee, and is in the draft for Unicode 16.0.

I believe the Table 4-8. Name Derivation Rule Prefix Strings should have an
2EBF0..2EE5D NR2 “CJK UNIFIED IDEOGRAPH-"
line added, corresponding to the CJK Ideograph Extension I, First .. CJK Ideograph Extension I, Last
addition in 15.1.

Date/Time: Sun Jan 07 09:40:18 CST 2024
ReportID: ID20240107094018
Name: Charlotte Buff
Report Type: Other Document Submission
Opt Subject: L2/23-193r3: Name of one chemical arrow

U+1F8D6 was accepted for a future update under the name LONG RIGHTWARDS
ARROW WITH THROUGH X (cf. 177-C33). The “WITH” is erroneous; the character
should be called LONG RIGHTWARDS ARROW THROUGH X. This would also
synchronise the name with the existing U+2947 RIGHTWARDS ARROW THROUGH X.

Other Reports

(None at this time.)