L2/23-233

Comments on Public Review Issues
(July 4 - October 11, 2023)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 26, 2023, since the previous cumulative document was issued prior to UTC #176 (July 2023).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of October 26, 2023.

Issue Name Feedback Link
483 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)
482 Proposed Draft UTR #56, Unicode Cuneiform Sign Lists (feedback) No feedback at this time
479 Proposed Update UTS #53, Unicode Arabic Mark Rendering (feedback) No feedback at this time

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports

 


Feedback routed to CJK & Unihan Group for evaluation [CJK]

Date/Time: Thu Aug 10 17:21:54 CDT 2023
ReportID: ID20230810172154
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Proposed kRSUnicode property value changes and additions

The following two ideographs are structurally the same as U+9F52 齒, but are missing two strokes:

U+2398A 𣦊
U+2EBBD 𮮽

Their kRSUnicode property values are currently as follows:

U+2398A kRSUnicode 77.9
U+2EBBD kRSUnicode 211.0

I recommend that they be changed to the following, which involves adding a second property value 
and changing the number of residual strokes for Radical #211 from 0 to -2:

U+2398A kRSUnicode 77.9 211.-2
U+2EBBD kRSUnicode 211.-2 77.9

Date/Time: Sun Jul 30 17:08:21 CDT 2023
ReportID: ID20230730170821
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+5807

I have been in direct contact with Ken Lunde and we both agree this value should 
be 574 as appears on p.85 of Casey. I am submitting this feedback to provide tracking 
for the change.

Feedback routed to Script ad hoc for evaluation [SAH]

Date/Time: Wed Oct 11 15:27:13 CDT 2023
ReportID: ID20231011152713
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Suggestion to change the name of certain provisionally assigned characters

These characters are provisionally assigned: LEFTWARDS ARROW FROM DOWNWARDS
ARROW & RIGHTWARDS ARROW FROM DOWNWARDS ARROW. I would agree with this
naming if the downwards arrow portion was samller than the
leftward/rightwards portion but the opposite is true. Consider 21A6 ↦
LEFTWARDS ARROW FROM BAR, where the "bar" part is smaller;these names make
me think of that character but the bar has a down arrow head. I think more
intuitive names would be DOWNWARDS ARROW WITH BRANCHING LEFTARDS/RIGHTWARDS
ARROW.

I also disagree with the name: BENGALI LETTER ALTERNATE BA, considering it's
exclusively for Pali/Sanskrit ortographies, a less confusing name could be
BENGALI LETTER PALI-SANSKRIT BA. And the regular ba should have a note
stating (only represents va in Pali or Sanskrit).

I also suggest that the Sidetic block characters be named based on their
sound if they have been deciphered and with the NXX otherwise. This would
declutter the chart by not having a bunch of informative aliases. Formal
aliases can be added if other letters are deciphered after encoding. 

Feedback routed to Properties & Algorithms Group for evaluation [PAG]

Date/Time: Tue Aug 15 12:10:59 CDT 2023
ReportID: ID20230815121059
Name: David Carlisle
Report Type: Error Report
Opt Subject: TR25 UNICODE SUPPORT FOR MATHEMATICS


https://unicode.org/reports/tr25/ 

Some of the Math Classifications in the MathClass-15 data file associated with 
TR25 seem incorrect.
https://www.unicode.org/Public/math/revision-15/MathClassEx-15.txt 

23B0;R;⎰;lmoust;ISOAMSC;;UPPER LEFT OR LOWER RIGHT CURLY BRACKET SECTION
23B1;R;⎱;rmoust;ISOAMSC;;UPPER RIGHT OR LOWER LEFT CURLY BRACKET SECTION
27C5;R;⟅;;;;LEFT S-SHAPED BAG DELIMITER
27C6;R;⟆;;;;RIGHT S-SHAPED BAG DELIMITER

These are classified as R (infix relation, TeX  \mathrel) when it would seem more 
appropriate to use O and C (\mathopen \mathclose) which are the assignments currently 
made by LaTeX.

I'm doing a systematic comparison with LaTeX Unicode-math, there are other differences 
as detailed in this github issue

https://github.com/wspr/unicode-math/issues/619#issuecomment-1678594025 

However in some of these cases we may choose to change the TeX settings or simply 
document the differences although for example as listed in that issue, TeX traditionally 
makes daggers U+2020 an dU+2021  binary operators (B) not relations (R) which would give 
them more space.

Date/Time: Tue Aug 22 05:57:09 CDT 2023
ReportID: ID20230822055709
Name: Andrew West
Report Type: Error Report
Opt Subject: DerivedNumericValues.txt

For Unicode 15.1, there is a discrepancy between the numeric values for
U+5146 and U+79ED as given in DerivedNumericValues.txt (single value only)
and Unihan and ucd.xml (two values each):

https://www.unicode.org/Public/draft/UCD/ucd/extracted/DerivedNumericValues.txt:
5146          ; 1000000.0 ; ; 1000000 # Lo       CJK UNIFIED IDEOGRAPH-5146
79ED          ; 1000000000.0 ; ; 1000000000 # Lo       CJK UNIFIED IDEOGRAPH-79ED

Unihan_NumericValues.txt:
U+5146	kPrimaryNumeric	1000000 1000000000000
U+79ED	kPrimaryNumeric	1000000000 1000000000000

ucd.nounihan.flat.xml:
      <char cp="5146" age="1.1" na="CJK UNIFIED IDEOGRAPH-#" JSN="" gc="Lo" ccc="0" dt="none" dm="#" nt="Nu" nv="1000000 1000000000000" .../>
      <char cp="79ED" age="1.1" na="CJK UNIFIED IDEOGRAPH-#" JSN="" gc="Lo" ccc="0" dt="none" dm="#" nt="Nu" nv="1000000000 1000000000000" .../>

The derived numeric value should be based on kPrimaryNumeric:

# Derived Property:   Numeric_Value
#  Field 1:
#    The values are based on field 8 of UnicodeData.txt, plus the fields
#    kAccountingNumeric, kOtherNumeric, kPrimaryNumeric in the Unicode Han Database (Unihan).
#    The derivations for these values are as follows.
#      Numeric_Value = the value of kAccountingNumeric, kOtherNumeric, or kPrimaryNumeric, if they exist; otherwise
#      Numeric_Value = the value of field 8, if it exists; otherwise
#      Numeric_Value = NaN

However, the format of the file only allows for a single value.

My personal opinion is that Numeric_Value should always be a single value,
even in cases such as U+5146 and U+79ED where there are alternative
interpretations of the numeric value, otherwise implementations which rely
on UCD data to apply numeric value (e.g. for numeric sorting) will not know
which of the space-separated list of numeric values to apply.

My preferred solution would be:

1. Allow multiple alternative numeric values in the Unihan database only
(i.e. no change to kPrimaryNumeric for U+5146 and U+79ED);

2. Allow only a single numeric value for Numeric_Value in
DerivedNumericValues.txt, selecting the most widely-used modern
interpretation for U+5146 and U+79ED, and modifying accordingly the stated
derivation for the value given in Field 1;

3. Derive the "nv" value in ucd.xml from Numeric_Value in DerivedNumericValues.txt.

Date/Time: Thu Jul 20 19:15:14 CDT 2023
ReportID: ID20230720191514
Name: Leroy D. Geisse V.
Report Type: Website Problem [UCD, Index.txt, 16.0]
Opt Subject: Missing character name variant


I think that this is a minor issue. Regards.

By searching for "cursor" in the Character Name Index (https://www.unicode.org/charts/charindex.html), I found is not the variant "down, fast cursor".

cursor down, fast
cursor left, fast
cursor right, fast
cursor up, fast

fast cursor down
fast cursor left
fast cursor right
fast cursor up

left, fast cursor

right, fast cursor

up, fast cursor

Date/Time: Tue Sep 26 05:26:46 CDT 2023
ReportID: ID20230926052646
Name: Henri Sivonen
Report Type: Error Report
Opt Subject: UTS #10

Hi,

https://www.unicode.org/reports/tr10/tr10-49.html#Other_Applications_of_Collation 
has this sentence: “For example, if v and w are treated as identical base
letters in Swedish sorting, then they should also be treated the same for
searching.”

This example has become obsolete. See
https://unicode-org.atlassian.net/browse/CLDR-17050 and links backwards
from there to issues and CLDR changesets concerning both Swedish and
Finnish search collations.

(Perhaps it could be mentioned instead that ä and å are primary-distinct
from a in Swedish.)

Henri Sivonen

Feedback routed to Emoji SC for evaluation [ESC]

(None at this time.)


Feedback routed to Editorial Committee for evaluation [EDC]

Date/Time: Sun Sep 17 02:08:23 CDT 2023
ReportID: ID20230917020823
Name: Lim Hian-tong
Report Type: Error Report
Opt Subject: Chapter 18 of the Unicode Standard, Version 15.0.0

Chapter 18 of the Unicode Standard contains information about dialects of
Chinese that does not reflect the actual situation, as described below.
Please consider rewriting the segments that are apparently incorrect or
inaccurate.

On page 747, it is claimed that speakers of Chinese languages other than
Mandarin learn to read and write Mandarin pronouncing it with the rules of
their own language, which, although still practiced in Hong Kong and Macao,
does not apply to most parts of the Chinese-speaking world. The majority of
Chinese-medium schools in the 21st century teach written Mandarin text with
Mandarin pronunciation exclusively, regardless of students’ dialectal
backgrounds. The situation metaphorized as having Spanish children
pronouncing French text as if it were Spanish hardly happens anymore
outside Hong Kong and Macao.

Another paragraph on page 747 states that modern Chinese languages are
almost never seen in printed form except for Cantonese, which is not the
fact. A significant example of various modern Chinese languages in printed
form would be the case in Taiwan where lessons of the Taiwanese, Hakka and
Matsu languages, conducted with standardized writing systems of these
languages, have been made available to every single student for more than
two decades. Similar approaches to teach local languages with orthographies
based on their respective vernacular forms do exist in certain schools in
the PRC as well.

Page 765 describes the use of Bopomofo letters for the phonetic
representation of southern Chinese dialects as “never fully standardized,”
despite the fact that a set of Bopomofo symbols designed for the Matsu
dialect is officially in use on the Matsu islands.

Date/Time: Mon Sep 25 10:25:54 CDT 2023
ReportID: ID20230925102554
Name: Philippe Verdy
Report Type: Error Report
Opt Subject: /charts/PDF/U10600.pdf

Note: This has already been corrected in the 16.0 annotations draft for the names list.

The annotations for two Linear A characters seem to be obviously wrong:

  U+10703 𐜃 LINEAR A SIGN A600 • 10762 𐝢 a802, 10741 𐝁 a702 b
  U+10704 𐜄 LINEAR A SIGN A601 • 10762 𐝢 a802, 10748 𐝈 a709 l

The "base" character described should be U+10764 𐝤 linear A sign A804, 
instead of U+10762 𐝤 linear A sign A802:

  U+10703 𐜃 LINEAR A SIGN A600 • 10764 𐝤 a804, 10741 𐝁 a702 b
  U+10704 𐜄 LINEAR A SIGN A601 • 10764 𐝤 a804, 10748 𐝈 a709 l

This bug occurs in the two charts:

* /charts/PDF/U10600.pdf
* /charts/fr/PDF/U10600.pdf

and is also present in the NamesList.txt file in the UCD containing 
annotations, used to automatically generate these charts:

* /Public/UCD/latest/ucd/NamesList.txt

  10703	LINEAR A SIGN A600
 	* 10762 a802, 10741 a702 b
  10704	LINEAR A SIGN A601
 	* 10762 a802, 10748 a709 l

which should be:

  10703	LINEAR A SIGN A600
 	* 10764 a804, 10741 a702 b
  10704	LINEAR A SIGN A601
 	* 10764 a804, 10748 a709 l

Date/Time: Mon Oct 16 12:01:45 CDT 2023
ReportID: ID20231016120145
Name: Denny Vrandečić
Report Type: Error Report
Opt Subject: Unicode Standard 15.0

Page 10, Section 2.1 (Architectural Context), Subsection "Text Elements,
Characters, and Text Processes", contains the following sentence:

"For example, in traditional German orthography, the letter combination “ck”
is a text element for the process of hyphenation (where it appears
as “k-k”), but not for the process of sorting."

"traditional German orthography" seems confusing here, given that this has
not been the case since the 1996 orthographic reform, i.e. more than a
quarter century ago. I would suggest to change the term to
either "pre-1996 German orthography" (to be explicit) or "(old / former /
previous /dated) German orthography" (to avoid the potentially loaded
term "traditional").

Other Reports

(None at this time.)