Comments on Public Review Issues

L2/20-104

Comments on Public Review Issues
(January 10 - April 20, 2020)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 20, 2020, since the previous cumulative document was issued prior to UTC #162 (January 2020).

Issue	Name	Feedback Link
408	QID Emoji	(feedback)
404	Proposed Update UTS #18, Unicode Regular Expressions	(feedback)

Feedback routed to Unihan ad hoc for evaluation

Date/Time: Mon Feb 17 07:38:13 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Radical for U+80E7 胧

The current radical for U+80E7 胧 is #130, but it's G-Source reference value
is G0-6B4A, that means it's the simplified variant form of U+6727 朧 which
the G-Source reference value is G1-6B4A and the radical is #74, not U+268AB
. Is it better to update the radical for U+80E7 胧 to #74?

Date/Time: Mon Feb 17 07:38:54 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Radical for U+4E80 亀

The current radical for U+4E80 亀 is #5, and in ISO/IEC DIS 10646-1.2:1992,
the references are T:E-396C and J:0-3535, and there is also the EACC source,
2D632D, in Unicode 1.0. It's the variant of U+9F9C 龜. Could we add the
second radical as #213 under U+4E80 亀? Cf.
https://mojikiban.ipa.go.jp/search/detail/MJ006424  

Notice that the radical for U+2A6C9 is #213, U+2A6C9 and U+4E80 亀 are similar.

Date/Time: Mon Feb 17 11:34:50 CST 2020
Name: Lee Collins
Report Type: Public Review Issue
Opt Subject: Wrong Japanese reading in Unihan

U+5807 shows the Kun reading "Sumire". This is no doubt a mistake for the
similar but different character U+83EB. The respective DKWZ codes are 05212
and 31207. U+5807 should have the Kun readings NEBATUTI, NURU, WAZUKA, etc.
The On readings are KIN and GON

Date/Time: Wed Feb 19 02:51:51 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Corrections for UAX #45

In the current USourceData file, the data for UTC-00550 is shown as below.

UTC-00550;UTC-02637;;30.11;0206.201;⿰口笪;kCheungBauerIndex 375.06;

However, UTC-02637 has been updated to UK-02637, so Field 1 should be
updated to UK-02637 correspondingly as below.

UTC-00550;UK-02637;;30.11;0206.201;⿰口笪;kCheungBauerIndex 375.06;

Date/Time: Wed Feb 19 05:38:12 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Corrections for UAX #45

The data for UTC-00749 should be updated as below.

UTC-00749;UK-02870;;157.9;1230.291;⿰⻊某;kCheungBauerIndex 463.06;

UTC-00749 and UK-02870 are duplicates.

Date/Time: Fri Mar 6 05:52:34 CST 2020
Name: Eiso Chan
Report Type: Error Report
Opt Subject: IDSes for UTC-00475 and UTC-00476 in UAX #45

The IDSes for UTC-00475 and UTC-00476 should be updated as below,

UTC-00475;V;U+947D;167.15;1326.121;⿰金⿱⿰夬夬貝;kLau 1580;
UTC-00476;U;U+947D;167.17;1327.051;⿰金⿱⿰失失貝;kLau 1581;

Feedback routed to Script ad hoc for evaluation

Date/Time: Tue Jan 28 18:57:28 CST 2020
Name: Sarabveer Singh
Report Type: Feedback on an Encoding Proposal
Opt Subject: Suggestions for Gurmukhi Bindi Before Bihari (L2/18-319, L2/19-167, L2/19-283)

Singh in L2/18-319 and L2/19-167 wishes for the Unicode specification to add
support for GURMUKHI SIGN BINDI and GURMUKHI TIPPI to display before
GURMUKHI VOWEL SIGN II. As noted in L2/19-047, this combination is most
likely to be a stylistic difference.

However, this combination should be supported as a stylistic option in
Unicode fonts. In testing, I have only found the "liga" and/or the "rlig"
OpenType lookups display this stylistic combination. This is an unsupported
method and does not work universally on different systems. In my experience,
the ligature displays correctly in the major web browsers (Google Chrome,
Mozilla Firefox, Apple Safari), but they do not display correctly in
Microsoft's software (Office, Edge, Internet Explorer).

I request that a OpenType Ligature Lookup Table be recommend to implement
this stylistic combination in Unicode fonts, such as the "abvf" Lookup
Table.

Date/Time: Sat Feb 1 17:56:25 CST 2020
Name: Doug Ewell
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/20-061, Final Proposal to encode Western Cham in the UCS

L2/20-061 proposes, among other characters, a group of eight characters for
Western Cham lunar month names (ARABIC SYMBOL ONE DOT LUNAR MONTH through
ARABIC SYMBOL SEVEN DOTS LUNAR MONTH), to be placed in the Arabic
Mathematical Alphabetic Symbols block at code points U+1EEF8 through
U+1EEFF.

The Arabic Mathematical Alphabetic Symbols block was intended for stylistic
variations of existing Arabic letters, to be used in special mathematical
contexts. It is analogous to the Mathematical Alphanumeric Symbols block for
existing Latin and Greek letters and digits. It is not intended for encoding
of new “normal” characters. The proposed characters are “special” in that
they are used only in Western Cham and only for lunar month names, but they
are not “mathematical”; they are not used to represent variables, constants,
sets, etc. in mathematical expressions.

Both the text and the proposed Unicode properties show that the proposed
characters are not stylistic variations of existing Arabic letters, and do
not follow the pattern of other characters in this block:

1EEF8;ARABIC SYMBOL ONE DOT LUNAR MONTH;So;0;ON;;;;;N;;;;;
cf.
1EE00;ARABIC MATHEMATICAL ALEF;Lo;0;AL;<font> 0627;;;;N;;;;;

They are “symbols” (So), not “letters” (AL), and are not <font>
varieties of existing letters.

In the revision history, it was noted that these characters were moved in
Revision 3 (November 2019) from the proposed Western Cham block to this
block. Item 6 in the section “Repertoire” includes an inadvertent lingering
reference to ARABIC SYMBOL SEVEN DOTS LUNAR MONTH being encoded at U+1E26F.

I recommend moving these eight symbols back into the proposed Western Cham
block, as they were before Revision 3. I have no objection at all to
encoding these symbols, only to this particular proposed location.

Date/Time: Sun Feb 16 15:29:46 CST 2020
Name: Arnim Sauerbier
Report Type: Feedback on an Encoding Proposal
Opt Subject: Symbols for Legacy Computing 1FB3C...


The current proposal for "Symbols for Legacy Computing" unfortunatly lacks
two-character full-triangles.

Recreating the 'square font' diagonal triangles for legacy computing on
modern systems require double-width triangles.  The extant proposal only
allows creation of triple-width triangles, forming a non-square aspect
ratio.

The following additions would allow drawing roughly 1:1 bilateral triangles
out of two adjacent characters.  

A two-glyph triangle on bottom left, consisting of:
LOWER LEFT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT
LOWER LEFT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT

A two-glyph triangle on bottom right, consisting of:
LOWER RIGHT BLOCK DIAGONAL BOTTOM LEFT TO CENTER RIGHT
LOWER RIGHT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT

A two-glyph triangle on upper left, consisting of:
UPPER LEFT BLOCK DIAGONAL LOWER LEFT TO CENTER RIGHT
UPPER LEFT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT

A two-glyph triangle on upper right, consisting of:
UPPER RIGHT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT
UPPER RIGHT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT

These new codepoints are necessary to recreate Legacy Computing Graphics.

I have been using PETSCII and other legacy-computing graphics since 1981,
and am happy to answer any questions you may have.

Thank you for your consideration in preserving legacy computer art.

Arnim

Date/Time: Sun Feb 16 16:07:01 CST 2020
Name: Jean Larapiere
Report Type: Error Report
Opt Subject: Problem with "Symbols for Legacy Computing"

The current proposal for "Symbols for Legacy Computing" unfortunately lacks
two-character full-triangles.

Recreating the 'square font' diagonal triangles for legacy computing on
modern systems require double-width triangles.  The extant proposal only
allows creation of triple-width triangles, forming a non-square aspect
ratio.

The following additions would allow drawing roughly 1:1 bilateral triangles
out of two adjacent characters.  

A two-glyph triangle on bottom left, consisting of:
LOWER LEFT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT
LOWER LEFT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT

A two-glyph triangle on bottom right, consisting of:
LOWER RIGHT BLOCK DIAGONAL BOTTOM LEFT TO CENTER RIGHT
LOWER RIGHT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT

A two-glyph triangle on upper left, consisting of:
UPPER LEFT BLOCK DIAGONAL LOWER LEFT TO CENTER RIGHT
UPPER LEFT BLOCK DIAGONAL CENTER LEFT TO UPPER RIGHT

A two-glyph triangle on upper right, consisting of:
UPPER RIGHT BLOCK DIAGONAL UPPER LEFT TO CENTER RIGHT
UPPER RIGHT BLOCK DIAGONAL CENTER LEFT TO LOWER RIGHT

These new codepoints are necessary to recreate Legacy Computing text-graphics.

Thank you,

Date/Time: Thu Feb 20 12:45:44 CST 2020
Name: Markus Scherer
Report Type: Error Report
Opt Subject: review Script of U+16FE3 OLD CHINESE ITERATION MARK

For consideration by script ad hoc & UTC

Unicode 13 adds U+16FF0/1 Vietnamese reading marks with sc=Hani (and gc=Mc).

In the same block is U+16FE3 OLD CHINESE ITERATION MARK with sc=Zyyy (and gc=Lm).

In discussion, Ken W. said that this one
"patterns similarly to the modern iteration mark 3005. That one *is* sc=Han."
and
"So it would seem reasonable to me (in a *future* version) to ask for 16FE3 to change to sc=Han"

Please review.

Date/Time: Wed Apr 8 00:37:28 CDT 2020
Name: Barun Kumar Sahu
Report Type: Error Report
Opt Subject: "Devanagari sign avagraha" followed by "Devanagari sign anusvara" or "Devanagari sign candrabindu"

Ideally, "Devanagari sign avagraha" (U+093D) can be followed by "Devanagari sign 
anusvara" (U+0902) or "Devanagari sign candrabindu" (U+0901).  However, some 
word-processors do not accept this combination.

My question is: Can "Devanagari sign avagraha" be followed by "Devanagari sign 
anusvara" or "Devanagari sign candrabindu" as per the Unicode Standard?  
(I think it should be allowed.  For example, we should be able to write कऽंप or कऽँप.)

Feedback routed to ucd-dev ad hoc for evaluation

Date/Time: Thu Jan 9 21:21:51 CST 2020
Name: Alex Henrie
Report Type: Other Question, Problem, or Feedback
Opt Subject: Lack of precomposed capital Greek letters complicates lowercasing, uppercasing, and normalizing Greek text

Unicode defines precombined characters for various lowercase Greek letters
with diacritics, but not their uppercase forms.[1] However, this can cause
Greek texts encoded in NFC to no longer be NFC-normalized after changing
case: For example, if "Ρ̓ᾶρος" (the name of an ancient Greek hero) is
converted to lowercase, its first character changes from 03A1 0313 to 03C1
0313 and must be normalized again to get to 1FE5. The lack of capital
characters creates other complications as well, such as breaking any
uppercasing or lowercasing algorithm that does not allow changing the length
of the string.

Would you please reconsider including these characters in the standard so
that Greek NFC text does not need to be renormalized after lowercasing? Or
at least add a note about this problem to the Greek Language FAQ?[2]

[1] https://www.opoudjis.net/unicode/unicode_gaps.html#gaps 
[2] https://www.unicode.org/faq/greek.html

Date/Time: Wed Feb 19 16:07:42 CST 2020
Name: Karl Williamson
Report Type: Error Report
Opt Subject: Request uniform version syntax

This isn't an error, but it is an annoyance that the data files you furnish
have at least three different syntaxes for specifying the versions they
apply to:

Files in the UCD have the version embedded in the first line of the file

Files in the security subdirectory have a separate line like 'Version:
13.0.0'

And EmojiData.txt has a line 'Version: 13.0'.

There really is no need to have disparate syntaxes, and it means code
reading them has to have extra intelligence.

Date/Time: Tue Mar 3 16:17:10 CST 2020
Name: Daniel Bünzli
Report Type: Error Report
Opt Subject: UAX #14 for 13.0.0: LB27 first's line is obsolete

Hello, 

I think (more precisely my compiler thinks [1]) the first line of LB27 is
already handled by the new LB22 rule and can be removed. 

Best, 

Daniel

[1]
File "uuseg_line_break.ml", line 206, characters 38-40:

206 |   | (* LB27 *)  _, (JL|JV|JT|H2|H3), (IN|PO) -> no_boundary s
                                            ^^
Warning 12: this sub-pattern is unused.

[Filed by Rick on behalf of user, per KW. We can delete this if original poster submits it.]

Date/Time: Sun Mar 8 10:50:59 CDT 2020
Name: Zack Newman
Report Type: Error Report
Opt Subject: Mistake in section 6.2 of UAX #29

I'm unsure if this is a mistake in sections 3.1.1 and 4.1.1 or section 6.2,
but 6.2 incorrectly states "ignoring Extend is sufficient to disallow
breaking within a grapheme cluster". The sequence of Unicode scalar values
(U+0600, U+0020) is considered a single grapheme cluster due to rule GB9,
but the sequence is parsed into two words according to 4.1.1. While it would
be ideal to not have sequences of Unicode scalar values that can be parsed
into more words than grapheme clusters, I think it's OK for that property to
not hold as long as there are no incorrect claims that it does hold like
there currently is in section 6.2.

Date/Time: Wed Apr 1 17:29:56 CDT 2020
Name: Elika J. Etemad
Report Type: Error Report
Opt Subject: Zero Width Space vs Arabic shaping: non-interop

There was some discussion in the W3C, triggered by some new test cases,
about whether ZWSP should break Arabic shaping, given spaces generally break
shaping. We found that Unicode clearly defines it as not breaking shaping,
but also found that Unicode's behavior does not seem to be widely
implemented, see [1].

The question to the UTC is, therefore, should ZWSP continue to be defined as
transparent wrt shaping, or should its definition be adjusted to match what
appears to be the current implementation reality?

[1] https://github.com/w3c/csswg-drafts/issues/3861#issuecomment-529348086 

[Fwiw, a number of participants in the discussion initially expected that
ZWSP would break shaping, just like all the other "space" characters. So
given that expectation plus the state of implementations, it might actually
make sense to spec this behavior and introduce a new character, if needed,
for an explicit break opportunity that does not break shaping.]

Feedback routed to Emoji SC for evaluation

(None at this time.)

Feedback routed to Editorial Committee for evaluation

Date/Time: Thu Apr 16 00:50:43 CDT 2020
Name: Bogdan
Report Type: Error Report
Opt Subject:

Found an error in UnicodeStandard-13.0.pdf - on a page 642 (16.1 Thai)
"""
In particular, when used as a consonant diacritic, U+0331 combining macron 
below can occur with vowel signs U+0338 THAI CHARACTER SARA U or U+0339 THAI 
CHARACTER SARA UU. 
"""
There are 2 typos - wrong code for THAI CHARACTER SARA U - should be U+0e38 
(instead of U+0338) and wrong code for  THAI CHARACTER SARA UU - should be 
U+0e39 (instead of U+0339)

Other Reports

(None at this time.)

L2/20-104