Accumulated Feedback on PRI #472

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Thu Feb 16 00:03:46 CST 2023
ReportID: ID20230216000346
Name: Ali Soejono
Report Type: Public Review Issue
Opt Subject: 472

the changes proposed are more perfect and in my opinion easier to apply and more 
easily understood by users both in word breaks and sentence breaks.

Date/Time: Thu Feb 16 18:02:14 CST 2023
ReportID: ID20230216180214
Name: ERIEQ DHANAR NUGROHO
Report Type: Public Review Issue
Opt Subject: 472

We need more line back for Aksara Jawa (carakan) because the last line back 
setting is so awkward for writing long phrase. 

Date/Time: Thu Feb 16 18:06:33 CST 2023
ReportID: ID20230216180633
Name: ERIEQ DHANAR NUGROHO
Report Type: Public Review Issue
Opt Subject: 472

We need new linebreak's setting because the last linebreak is so awkward. 

Date/Time: Fri Feb 17 06:03:58 CST 2023
ReportID: ID20230217060358
Name: Suhadi Jogja
Report Type: Public Review Issue
Opt Subject: 472

new linebreak is better

Date/Time: Fri Feb 17 06:19:36 CST 2023
ReportID: ID20230217061936
Name: Iqra Hanacaraka
Report Type: Public Review Issue
Opt Subject: 472

new line break is urgent to justify

Date/Time: Tue Mar 07 04:09:12 CST 2023
ReportID: ID20230307040912
Name: Robin Leroy
Report Type: Public Review Issue
Opt Subject: 472

The proposal L2/22-080R2 affects a number of Brahmic scripts, for which it
appears to be a clear improvement. However, its effect is not quite
restricted to these scripts, as it changes line breaking class of the
Common character ◌ (U+25CC DOTTED CIRCLE). This introduces line break
opportunities, for instance, between a letter and a dotted circle, or
between two dotted circles.

See, for instance, a◌̀ and e◌̂◌̣ on the demo:
https://www.unicode.org/review/pri472/background.html?text=a%E2%97%8C%CC%80%0Ae%E2%97%8C%CC%82%E2%97%8C%CC%A3.

Such usage of the dotted circle is attested to describe a sequence of
combining marks; see the comments in
http://www.unicode.org/Public/UCD/latest/ucd/NormalizationTest.txt. While
the use cases to which this change would be disruptive may be niche, any
usage of the dotted circle is niche, including the one motivating the
change.

Refinement of the behaviour of the dotted circle could be relegated to a
dedicated proposal. However, absent that change in line breaking class, the
proposal would degrade the behaviour of sequences (dotted circle, virama)
in the affected scripts.

Instead, replacing AK | AS by AK | AL | AS in the first three sub-rules of the 
proposed rule LB28b would preserve the usability of the dotted circle as a placeholder base for

  1. pre-base consonants: AP × (AK | AL | AS) handles AP × ◌;
  2. virama: (AK | AL | AS) × (VF | VI) handles ◌ × (VF | VI);
  3. conjunct consonants: (AK | AL | AS) VI × AK handles ◌ VI × AK.

At the same time, this would not affect the behaviour of class AL except in
degenerate cases (cross-script virama or pre-base consonant usage).

This would not support the usage of the dotted circle itself as a subjoined
consonant to demonstrate the forms of combining marks or conjuncts thereon,
as cited and demonstrated in Section “Enabling the use of dotted circle as
a placeholder for subjoined consonants” of L2/22-080R2: there would be a
break in AK × VI ÷ ◌ × CM.

Changing sub-rule 3 of LB28b to (AK | AL | AS) VI × (AK | AL) may have
unwanted effects in nondegenerate mixed-script cases. Since both the cited
example and the one shown in the proposal itself only subjoin the dotted
circle to another dotted circle, an additional sub-rule

  5. AL VI × AL

would address the use cases shown while not disrupting nondegenerate cases.

Date/Time: Sun Mar 26 08:05:42 CDT 2023
ReportID: ID20230326080542
Name: Benny
Report Type: Public Review Issue
Opt Subject: 472

Hi, I'm Benny from Indonesia. I'm an active Javanese Wikisource editor, and
often proofreading Javanese script books and manuscripts. I also created an
online Javanese script transliterator (Nulisa Aksara Jawa) and Wikimedia's
Universal Language Selector's Javanese transliteration method.

While I was working (transcribing, proofreading, validating) with Javanese
manuscripts, which in turn would be used in Wikisource Loves Manuscripts
project to train the AI Transkribus to OCR Javanese manuscripts, I came
upon the question about how ZWS is handled in Transkribus (or any future
Javanese script OCR software). 

I've wrote a report for this using an example from Serat Damarwulan, in
https://phabricator.wikimedia.org/T333088

Answering the questions:

1.    Which of the scripts mentioned above do you use?

Javanese

2.    For this script, is the proposed style of line breaking clearly better than the old one?

Yes, it is

3.    Does the proposed style of line breaking provide good, or at least acceptable behavior for general use?

Yes, it does

4.    Are there cases where the proposed style of line breaking does not work correctly? Please provide example text and describe the expected line breaking behavior.

As far as I tested, it worked fine


5.    What would be the impact for you if the proposed style were not introduced?

Texts written in Javanese script by someone who are not familiar with ZWS/ZWNJ would become a single long line (overflow) with no line breaking. This would affect web reading (on PC as well on mobile), and not natural at all.

6.    May we contact you for follow-up?

Yes