Public Review Issues

Accumulated Feedback on PRI #494

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Sun Jan 07 09:10:23 CST 2024
ReportID: ID20240107091023
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 494

Currently it is stated in table 2 that U+16D6A KIRAT RAI VOWEL SIGN AU 
(together with two other characters) will be added to Grapheme_Cluster_Break=V. 
However, instead of AU it should be U+16D69 KIRAT RAI VOWEL SIGN O because AU 
decomposes into O+E, while AU itself does not appear in the decomposition of 
any other character.

Feedback above this line reviewed during UTC #178 in January 2024.

Date/Time: Mon Apr 22 11:41:57 CDT 2024
ReportID: ID20240422114157
Name: Jules Bertholet
Report Type: Error Report
Opt Subject: PropList.txt


UAX 29 (http://unicode.org/reports/tr29/) says the following:

> The default rules have been written so that they can be applied directly 
> to non-NFD text and yield equivalent results [versus applying to NFD text].

In support of this aim, it later says the following about legacy grapheme clusters:

> The continuing characters include nonspacing marks, the Join_Controls 
> (U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER) used in Indic 
> languages, and a few spacing combining marks to ensure canonical equivalence.

However, this property (that grapheme cluster boundaries are closed under
canonical equivalence) currently does not hold. U+0CC0 KANNADA VOWEL SIGN
II has `Grapheme_Cluster_Break=SpacingMark`, but it NFD decomposes to two
characters (U+0CBF KANNADA VOWEL SIGN I and U+0CD5 KANNADA LENGTH MARK)
which both have `Grapheme_Cluster_Break=Extend`. To correct this error,
U+0CC0 should be given the property `Other_Grapheme_Extend` in
`PropList.txt`.

Date/Time: Mon Apr 22 12:01:50 CDT 2024
ReportID: ID20240422120150
Name: Jules Bertholet
Report Type: Error Report
Opt Subject: PropList.txt

# Amending my previous report

A few moments ago, I submitted an error report about the
`Grapheme_Cluster_Break` property of U+0CC0. I would like to amend this
report to note the following other characters which are also affected:

- U+0CC7
- U+0CC8
- U+0CCA
- U+0CCB
- U+1B3B
- U+1B3D
- U+1B43

Date/Time: Thu May 09 10:36:08 CDT 2024
ReportID: ID20240509103608
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: 494

Action item 179-A113 says to categorize semicolons as Sentence_Break =
SContinue. https://github.com/unicode-org/unicodetools/pull/812 modifies
U+1364 ETHIOPIC SEMICOLON, U+A6F6 BAMUM SEMICOLON, and U+1DA89 SIGNWRITING
SEMICOLON accordingly. Those three scripts also have commas and colons,
which still have Sentence_Break = Other. If those scripts’ semicolons are
recategorized to match ASCII, so should their commas and colons; if there
is not yet any evidence supporting changing their commas and colons, there
probably isn’t any for their semicolons either, so their semicolons should
not be recategorized.

Lisu, Medefaidrin, Mongolian, Newa, and Vai don’t have semicolons, but they
do have commas or colons. Should those be recategorized too? I wouldn’t
assume so just from their character names, but maybe.