Comments on Public Review Issues

L2/23-078

Comments on Public Review Issues
(January 5 - April 5, 2023)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 04, 2023, since the previous cumulative document was issued prior to UTC #174 (January 20, 2023).

Issue Name Feedback Link

473 Unicode 15.1.0 Alpha Review (feedback) Note: has feedback, PRI is now closed

472 Line breaking at orthographic syllable boundaries (feedback)

471 Proposed Update UTS #51, Unicode Emoji (feedback)

470 Proposed Update UAX #24, Unicode Script Property (feedback) No feedback at this time

469 Proposed Update UAX #29, Unicode Text Segmentation (feedback)

468 Proposed Update UAX #45, U-source Ideographs (feedback)

467 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)

465 Proposed Update UAX #44, Unicode Character Database (feedback)

464 Proposed Update UAX #41, Common References for Unicode Standard Annexes (feedback) No feedback at this time

463 Proposed Update UTS #39, Unicode Security Mechanisms (feedback)

462 Proposed Update UAX #31, Unicode Identifiers and Syntax (feedback)

461 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time

460 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback)

Issue	Name	Feedback Link
473	Unicode 15.1.0 Alpha Review	(feedback) Note: has feedback, PRI is now closed
472	Line breaking at orthographic syllable boundaries	(feedback)
471	Proposed Update UTS #51, Unicode Emoji	(feedback)
470	Proposed Update UAX #24, Unicode Script Property	(feedback) No feedback at this time
469	Proposed Update UAX #29, Unicode Text Segmentation	(feedback)
468	Proposed Update UAX #45, U-source Ideographs	(feedback)
467	Proposed Update UAX #38, Unicode Han Database (Unihan)	(feedback)
465	Proposed Update UAX #44, Unicode Character Database	(feedback)
464	Proposed Update UAX #41, Common References for Unicode Standard Annexes	(feedback) No feedback at this time
463	Proposed Update UTS #39, Unicode Security Mechanisms	(feedback)
462	Proposed Update UAX #31, Unicode Identifiers and Syntax	(feedback)
461	Proposed Update UAX #14, Unicode Line Breaking Algorithm	(feedback) No feedback at this time
460	Proposed Update UAX #9, Unicode Bidirectional Algorithm	(feedback)

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports

Feedback routed to CJK & Unihan Group for evaluation [CJK]

Date/Time: Thu Jan 19 15:09:19 CST 2023
ReportID: ID20230119150919
Name: Lee Collins
Report Type: Error Report
Opt Subject: Unihan_Readings.txt


Found these in v. 15, they've been there for a while

U+6AAC	kDefinition	type of locust oracacia
>> type of locust or acacia
U+45E3	kDefinition	insect of mulberry, insects that damage to the melons
>> insect that lives in mulberry trees, insect that damages melons

Date/Time: Sun Mar 26 20:10:57 CDT 2023
ReportID: ID20230326201057
Name: Eiso Chan
Report Type: Error Report
Opt Subject: KP-Sources for U+6138, U+246D8 and U+29A8E

IRGN2608 had been discussed at IRG #60, but the issues have not been solved
yet. https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg60/IRGN2608_KP1disunify.pdf 
TCA, ROK and Mr. Henry Chan provided their feedback comments.

In IRGN2605, the discussion record is shown as below.
https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg60/IRGN2605MiscEditorialReport.pdf 

“Due to the lack of contact to DPRK, the editors believed it is
 inappropriate to disunify KP1 source characters at present.”

According to current UCV rules, these cases belong to misunifications, so it
is better to consider to add them to Errata list in the coming versions.

Feedback routed to Script ad hoc for evaluation [SAH]

Date/Time: Sun Mar 26 14:01:05 CDT 2023
ReportID: ID20230326140105
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Feedback on Bima characters (L2/23-070)

On document https://www.unicode.org/L2/L2023/23070-bima-script.pdf A few new characters 
are requested for the Buginese script to support the Bima orthography. I would just like to make a few suggestions.

  1. Disunify the flower shaped end of section character: It is clear that
  this punctuation mark is shaped very differently than the existing end of
  section already encoded, with both signs having very different epigraphic
  derivations. It is quite likely that users won't see these characters as
  interchangeable and would like to use a particular glyph without
  switching font or even want to use them concurrently and/or outside a
  Bima context. Therefore they should be considered distinct. I recommend
  using the codepoint 1A1D, with the name BUGINESE FLEURON, or BUGINESE
  SIGN FLEURON to parallel the similar character 10AF1 𐫱 MANICHAEAN
  PUNCTUATION FLEURON. A note could be added saying "used as an end of
  section sign in Bima texts". Along with the Pallawa and the existing end
  of section, they would be placed under the header "Punctuation". If the
  codepoint above needs to be occupied, I would recommend placing the
  Reduplication sign or the Gemination sign, since they are the most likely
  to see use beyond a Bima context. But it can also be left vacant for a
  future addition.

  2. Disunify the killer above and below: While the author states that these
  are identical in meaning and use, the difference in placement may be very
  relevant for rendering the text. Not only epigraphers would like to
  represent the text as it was written, regardless of semantic distinction,
  it is generally very problematic for fonts and rendering engines to
  handle a combining mark that doesn't have a definite position with the
  base letter. It can result in glitchy rendering, particularly when it has
  to interact with other marks above or below. One could say that a vowel
  silencer would never be placed in the same letter as a vowel sign or a
  gemination sign, but if the Buginese script increases in popularity
  (as the author presumably wants it to), then more additions are likely,
  including ones that may be placed with the vowel silencer. The two glyph
  variants of the sign don't need to be disunified, they can be handled at
  the font level without issue; but the choice of one glyph over another in
  the code chart needs some justification.

  3. Finally, the name "killer" needs to be reconsidered. While it is
  unlikely that people would take offense to it, it is suboptimal on its
  description. Better añternatives may be "Virama" or "Vowel Silencer".

  4. Add a note under the Pallawa sign stating "a glyph variant with two
  dots occurs" or "may have three or two dots". These glyph variants are
  similar enough that they don't merit disunification.

  5. Another modification would be to the one under 1A10, where instead of
  spelling it /h/, the note would read "used in Bima for ha".

In summary my most important suggestions are for the disunification of two
more characters (the fleuron sign and the two positional versions of the
vowel silencer), as well as adding a note under the Pallawa sign and
reconsidering the name of "killer".

Date/Time: Sun Mar 26 14:34:34 CDT 2023
ReportID: ID20230326143434
Name: Eduardo Marín Silva
Report Type: Other Document Submission
Opt Subject: Request to revise the glyph for the character 111CF (SHARADA INVERTED CANDRABINDU)

On document L2/17-428
(http://www.unicode.org/L2/L2017/17428-sharada-inv-candrabindu.pdf) a
proposal to add the Inverted Candrabindu for Sharada was made and
eventually accepted. Unlike other Indic scripts, the regular Candrabindu
for the Sharada script (11180) has an upper arc going above the dot, rather
than a lower arc below the dot. In effect the Candrabindu is inverted with
respect to the usual configuration. But it was brought to the attention of
Unicode that a "regular" Candrabindu (inverted in the context of Sharada)
was used along side the "inverted" (regular in the context of Sharada) and
with different uses. 

However the author decided to use a glyph with an arc that is shorter than
its already existing dual. But an examination of the manuscript
attestations in the proposal document, reveals that this isn't a consistent
practice due to scribe idiosyncrasies: While figure 1, 2, 3 and six show
the preferred glyph, figures 4 and 5 show an almost flat stroke instead of
an arc, and figure 7 looks like an ark with an extra vertical stroke rather
than a dot (contrasting with the regular one that does use a dot).

While the current glyph is fine, it makes one wonder if this discrepancy is
important, and if making a font that just rotates the glyph of 11180 is
acceptable. This situation is different than the Devanagari script, where
the regular (0901) and the inverted version (0900) of the Candrabindu are
mere rotations of each other.

I suggest contacting Anshuman Pandey and ask for his expert opinion, because
if a glyph change is not warranted, an annotation noting the glyphic
variants only applicable to the inverted version would be useful.

Feedback routed to Properties & Algorithms Group for evaluation [PAG]

Date/Time: Mon Jan 23 04:59:25 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS46

Chromium will ship Nontransitional Processing soon:
https://chromestatus.com/feature/5105856067141632. That covers all browser
engines. I suggest taking that opportunity to simplify this document and
its test suite and declare the transition period for which this conditional
existed to be over.

Date/Time: Mon Jan 23 05:11:04 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS46

Steps don't always consider that domain labels can be empty, e.g., when
CheckBidi is true the first subrule of "The Bidi Rule" inspects the first
character of a label. I think that might also apply to CheckJoiners and
potentially other steps. (I initially thought the problem here was
VerifyDnsLength not being considered, but that check happens much later on
in the processing model so it's something more fundamental.)

Date/Time: Mon Jan 23 05:13:16 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS46

Please change U+2260 (≠), U+226E (≮), and U+226F (≯) from
disallowed_STD3_valid to valid.

These code points are not decomposed so they can never conflict with
=, <, and >. And they are not inherently more confusing than any of
the other allowed code points, which include hieroglyphics and emoji. These
code points also work as-is in all browser engines (while < and > are
forbidden) and on balance preference ought to be given to retaining
compatibility so end users are not prevented from visiting websites or
seeing subresources that might use these code points in their domain for
one reason or another.

For further background and discussion please see
https://github.com/whatwg/url/issues/733.

Thank you!

Date/Time: Mon Jan 23 06:35:46 CST 2023
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: IdnaTestV2.txt

I have worked on importing IdnaTestV2.txt into web-platform-tests, the test
framework used by all web browsers. The goal was to meet the requirements
of the domain to ASCII algorithm specified at
https://url.spec.whatwg.org/#idna with beStrict initialized to false.

As such, I attempted to filter out ToASCII statuses for UseSTD3ASCIIRules,
CheckHyphens, and VerifyDnsLength. Hoping that any statuses that are left
would indicate a failure requirement.

You can find my work at
https://github.com/web-platform-tests/wpt/pull/38080.

I ran into the following issues. Most of them relate to status annotation.
IPv4 address confusion was the one issue that did not relate to statuses.

* VerifyDnsLength is not P4, but rather A4_1 and A4_2.

* Tests that use trailing ASCII digit labels (or such a label followed by a
  dot) are not useful for browsers as that will trigger the IPv4 parser.
  Which will then usually return failure as the input was not actually an
  IPv4 address string. This is a problem for a number of the A4_1 and A4_2
  tests. And also a large number of tests later on, such as ToASCII
  ("xn--gl0as212a.8.") or ToASCII("1.27"). I wrote a filter to exclude
  them, but it would be better if they were adjusted slightly (e.g., made
  to contain one non-EN code point) so what they aim to test can also be
  tested in browsers. (Note that the IPv4 parser runs after domain to
  ASCII, but the web platform doesn't provide a way to invoke domain to
  ASCII on its own and probably never will.)

* The test for ToASCII("$") is marked P1 and V6, not U1. This also effects
  numerous tests with <, >, and =. If they continue to have multiple
  statuses that will also make it impossible to filter them in an automated
  fashion. (This also applies to non-ASCII UseSTD3ASCIIRules code points,
  but I filed a separate request to remove those.)

* NV8 is not used as a status.

* A3 and X3 do not appear to be used as a status. (These are catered for by
  P4 presumably.)

* CheckBidi is not V8. V8 does not appear to be used. You'd have to filter
  out all B1-6 statuses instead.

Date/Time: Mon Feb 13 07:41:09 CST 2023
ReportID: ID20230213074109
Name: Anne van Kesteren
Report Type: Error Report
Opt Subject: UTS 46


An issue reported against the URL Standard indicated that the current
CheckBidi handling from UTS 46 is rather strict:
https://github.com/whatwg/url/issues/543. Namely, domains containing
RTL-labels cannot have labels consisting solely of ASCII digits preceding
them (such labels are invalid per The Bidi Rule subrule 1). This ends up
rejecting a number of domains in the wild and also seems unnecessarily
restrictive for RTL users.

In that issue I worked with Harald Alvestrand (one of the editors of RFC
5893: Right-to-Left Scripts for Internationalized Domain Names for
Applications (IDNA)) on a specific set of changes for UTS 46 that would
remedy this issue, while still imposing the majority of Bidi-related
requirements present in UTS 46 today.

The proposed changes are:

1. Remove step 8 of https://unicode.org/reports/tr46/#Validity_Criteria as
Validity Criteria only operates on a single label. (Although it somehow
claims to have knowledge about the domain_name string as well...)

2. Add a new step 5 to https://unicode.org/reports/tr46/#Processing .
(Note that due to step 4 we will have U-labels.)

The new step 5 would as follows:

* If CheckBidi, and the domain_name string is a Bidi domain name, record
  there was an error if neither of the following conditions is true:

   * All labels in the domain_name string satisfy the 6 subrules of The Bidi
     Rule of RFC 5893, Section 2.

   * RTL labels in the domain_name string are immediately followed by an LDH
     label whose first code point is not of class EN and all labels in the
     domain_name string are either LDH labels or satisfy the 6 subrules of
     The Bidi Rule of RFC 5893, Section 2.

Thank you for your consideration. This is probably the final IDNA-related
issue from the URL Standard. Once all of them have been resolved I’ll work
with browser implementers to ensure the changes (if any) get implemented so
we can finally declare victory on IDNA interoperability.

Feedback routed to Emoji SC for evaluation [ESC]

(None at this time.)

Feedback routed to Editorial Committee for evaluation [EDC]

Date/Time: Mon Feb 06 11:22:42 CST 2023
ReportID: ID20230206112242
Name: Danny Anderson
Report Type: Membership Inquiry
Opt Subject: Cross-Reference Addition

Could there be a cross-reference mention of "⪇" (0x2A87) in the Unicode Code Charts 
under the "≨" (0x2268) character? I know that 0x2A87 cross-references 0x2268, but 
not vice versa.

A similar note applies for "⪈" (0x2A88) and "≩" (0x2269).

Date/Time: Fri Feb 17 18:50:30 CST 2023
ReportID: ID20230217185030
Name: Liang Hai
Report Type: Error Report
Opt Subject: Core Spec

Cell of column “Code Point and Name” / row “Mongolian variation 
selectors” of Table 4-10, Unusual Properties, on page 194 of the 
Core Spec, version 15.0:

> 180B MONGOLIAN FREE VARIATION SELECTOR ONE
> 180C MONGOLIAN FREE VARIATION SELECTOR TWO
> 180D MONGOLIAN FREE VARIATION SELECTOR THREE
> 180E MONGOLIAN VOWEL SEPARATOR

Requests:

1. U+180E MONGOLIAN VOWEL SEPARATOR (aka, MVS; gc = Cf/Format) should be removed, 
as it’s not a variation selector.

2. The recently U+180F encoded MONGOLIAN FREE VARIATION SELECTOR
FOUR (FVS4) should be added.

Date/Time: Thu Mar 16 07:04:29 CDT 2023
ReportID: ID20230316070429
Name: Simon Cozens
Report Type: Error Report
Opt Subject: Typo in UTS Figure 11.5

Note: This error has already been fixed in the draft.

In the second example of complex hieroglyph formatting, the character sequence 
given is 1314A, 13433...

U+13433 is EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START, but the intended rendering 
(both in the image and in the symbolic column) is for the vertically-paired signs 
to be inserted at TOP END; 13433 should be replaced by 13434.

Other Reports

(None at this time.)

L2/23-078