This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Mon Sep 25 15:12:22 CDT 2017
Name: Thomas Milo
Report Type: Public Review Issue
Opt Subject: Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm Now Available for Public Review
Please consider taking into account the established solutions for these sequences as already implemented in www.mushafmuscat.om, which is now available world-wide as the authoritative, Azhar-recommended electronic reference Qur’ān. I don’t expect fundamental disagreements, but the project handles and solves all spelling issues without extending the existing Unicode repertoire for Arabic. However, for one class of characters we improved the behaviour by changing their typographical behaviour from overstrike to a new category of contextual behaviour: amphibious. I’ve reported about Amphibious Characters to the UTC. Some practical tips: Clicking the splash screen opens the text. Words can be searched in Manuscript View Mode, which presents the verses separated by flowers, surrounded by navigation and graphic controls. On the left top are the page number, text search and version locator boxes. Chapters can be located with the Wheel on the left top. Historical text layers can be exposed with the Colour Triangle at the left bottom. The dormant miniatures of unpointed characters can be activated with the حسصطعه icon at the left bottom. The chapter headings are in unpointed palaeographic Arabic; a ٮٮٯٮط / تنقيط icon (on all pagespreads except the first) provides two optional styles of pointing. Clicking in the margin brings up the Printed Mushaf View Mode, with verses marked by numbers. The Unicode structure can be found by clicking in the text, which brings up the Interactive View Mode. Letter blocks light up with mouse-over (selecting a letter block also activates the WordShaping interface, which provides aesthetic user interaction without touching the Unicode structure). Single-click selects letter block Double-click selects word Triple-click selects verse CTRL+C (windows) or CMD+C (Mac) copies the selected Unicode string. Caveat: we are preparing an update that positions all Qur’ānic stops to word final position, where they belong. This change will affect a few words that end in U+06E6 Arabic Small Yeh and U+06E5 Arabic Small Waw. Some background information: https://www.egypttoday.com/Article/4/14269/The-world%E2%80%99s-first-e-Quran-is-here https://oumma.com/premiere-mondiale-coran-numerique-presente-a-mascate-oman/ The presenting the project to the crown prince of Oman, HRH Sayyid Haytham https://www.youtube.com/watch?v=sHtBL2GvBxE My speech without voice-over https://www.youtube.com/watch?v=UpxsWGxgJIo Please don’t hesitate to ask for more clarification if needed.
Date/Time: Mon Sep 25 19:54:28 CDT 2017
Name: A./
Report Type: Public Review Issue
Opt Subject: PRI 359
1.) Better guidance should be given when to apply this algorithm. From reading the draft, it is usefully applied as a standard preparatory step before handing text off to a rendering engine, or perhaps also as a standard transformation on input to a rendering engine. This should be explicitly stated. 2.) If there are other situations, operations or processes where transforming Arabic text using this algorithm are seen as useful, these should be stated explicitly. 3.) There are situations and protocols that demand text in a given normalization form. Care should be taken in presenting the new algorithm so that it does not lead users to expect that all Arabic text "out to be" always in the transformed format. 4.) The stability note before 3.2 could be improved. The word "existing" will change meaning. Therefore: The set of MCM characters is intended to be stable. Characters from Unicode Version XXXX or earlier will not be added or removed from this set in future updates of this algorithm. Future updates may add characters to the set only if they were encoded in any version after XXXX. [The future version of the algorithm then changes XXXX to the latest value. This wording allows the TR to skip any versions of the Unicode Standard that do not contain new combining marks in Arabic.] 5.) In step 2, the specification does not address keeping multiple instances, e.g. multiple MCM, in relative order when moved "to the beginning". The current text could be interpreted as requiring multiple instances of such character to be inverted in relative order as each is moved "to the beginning". (The issue theoretically exists for shadda as it is defined by CCC value, which on the face of it allows the possibility of multiple distinct shadda code points where again, internal ordering could be observable).
Date/Time: Fri Oct 6 05:59:06 CDT 2017
Name: r12a
Report Type: Public Review Issue
Opt Subject: When should UAOA be used?
I'm sending this on behalf of the W3C i18n WG. It relates to UTR#53. I'm hearing through other channels that the algorithm described is intended to just indicate how characters should be temporarily reordered prior to rendering, rather than describe the order in which code points should be stored. Since most fonts generally produce the behaviour described anyway, it presumably therefore amounts to documenting expectations in terms of font behaviour, rather than specifying a new form of normalisation. It's not at all clear from the document that that is the case, however, which has caused the W3C WG significant alarm (and wasted discussion cycles). Please update the document to make this clearer. We will hold back the other comments we currently have queued up to send until we can re- evaluate them in the light of the changes to the document. Btw, the understanding of the intended use of UAOA is not helped by the way the document mentions canonically equivalent character sequences, nor by the vague descriptions of when CGJ should be used.
Date/Time: Fri Oct 6 06:05:21 CDT 2017
Name: r12a
Report Type: Public Review Issue
Opt Subject: AMOA rather than UAOA ?
http://www.unicode.org/reports/tr53/ "The Unicode Arabic Mark Ordering Algorithm (UAOA)" I find it difficult to figure out how one should pronounce UAOA and difficult to pronounce either way. I think AMOA (or even UAMAO) would be easier. Please consider that or some other change.
Date/Time: Tue Oct 10 09:40:48 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #359: U+08D9 ARABIC SMALL LOW NOON WITH KASRA
U+08D9 ARABIC SMALL LOW NOON WITH KASRA has Canonical_Combining_Class=Above when it should have been Below. Could the UAOA reorder it as Below?
Date/Time: Fri Oct 13 16:48:21 CDT 2017
Name: Behnam Esfahbod
Report Type: Public Review Issue
Opt Subject: Feedback on Proposed Draft UTR #53 — Revision 1
Status: Liaison Contribution - W3C i18n WG # Using UAOA in Text Editing On Section 5.6 “Other uses for UAOA”, we have: > > UAOA is very useful in implementations of backspacing in cases where > > there is no external information available about the original order > > in which the text was entered. For an average user of modern languages using the script, reordering the marks entered on a keyboard would be unexpected behavior. Basically, the document is suggesting that when user authors a text file with Arabic Marks put in a specific order, when the files is closed and reopened, the backspace should behave differently from the previous session. Also, it is not clear at all if UAOA will be useful in a text editing scenario. The claim for UAOA to be "very useful" needs some evidence, like existing implementation or some other data to support it. From the language and examples of the document, it looks like the usage of the algorithm is too focused on one application, Quranic text, and the claims are related only to that specific application of the script.
Date/Time: Fri Oct 13 16:59:35 CDT 2017
Name: Behnam Esfahbod
Report Type: Public Review Issue
Opt Subject: Feedback on Proposed Draft UTR #53 — Revision 1
Status: Individual Contribution The way Unicode Normalization works for Arabic Marks indeed has its problems, specially in font development and text rendering. The algorithm proposed in this PDUTR is a good way to address some of these problem. But, the document needs improvements in a few areas to be clear about what it does, when it should be applied, how it should be used, and what to expect from it. # 1. Scope of the PDUTR It looks like the PDUTR is the first UTR focused on details of rendering of Unicode text (besides the text of the Unicode Standard). Arabic is only one of the scripts that need some special attention (possibly reordering of the characters in memory) for rendering. It could be a better approach to have a document (UTR) focused on text rendering, which would also contain this algorithm for Arabic script, and would collect other best-practices over time, for other issues of rendering Arabic script, as well as other scripts. # 2. Scope of the algorithm The scope of the algorithm is not clear, neither in its title nor in the language. The name “Unicode Arabic Mark Ordering Algorithm” is suggesting that this is expected to be the only way Arabic Marks should be ordered in Unicode. That’s clearly not the case. In fact, the document is proposing an algorithm for “reordering” Arabic Marks (not just how they should be ordered) to solve a problem in “rendering” of the script. The title need to be clear about this. Maybe “Unicode Arabic Mark Reordering Algorithm for Rendering” (AMRAR)? Similarly, the Section 2 “Background” doesn’t clarify the scope of the algorithm and only explains how something is not working for some specific application with the existing normalization methods. # 3. Consequences of the Algorithm: Normalization The draft proposal is not clear about the effects of applying the algorithm on text. Specially, for strings X for which this algorithm is useful, we have UAOA(toNFC(X)) ≠ toNFC(UAOA(X)). So, although the behavior of the algorithm can be stabilized over Unicode verions, it’s very important how and when it’s applied to the text, since it changes a text in normalized form to a non-normalized form. Therefore, in terms of normalization, the algorithm cannot be considered stable at all. The document needs to be clear about this, even though it’s obvious from a technical point of view. # 4. Consequences of the Algorithm: Semantics With UAOA applied on text during rendering, some strings collapse into a single sequence. Basically, there are plenty of strings X and Y, where toNFC(X) ≠ toNFC(Y), but UAOA(toNFC(X)) = UAOA(toNFC(Y)). Basically, this is changing the semantics of existing text encoded in Unicode, since the rendering will be different afterwards. The document is not clear about this semantic change and only claims to “correcting” all the problems. The proposal is suggesting to use CGJ to preserve the old semantics when needed. The document needs to be more clear about how to preserve the semantics. In fact, there should be a clear algorithm to convert a string X to preserve the semantics when changing the (rendering) interpretation, since for a couple of decades users have been storing text in the current semantics of the encoding, which has been the only recommended way to do so by Unicode. # 5. Not enough details in the examples The examples are missing the information needed for the average audience to understand the details. To be understood correctly, they need to be accompanied by the encoding of the text they are representing, and how the algorithm works on such a sequence.
Feedback above this line was reviewed in the October 2017 UTC meeting.
Date/Time: Wed Jan 10 08:29:55 CST 2018
Name: r12a
Report Type: Error Report
Opt Subject: Use HTML rather than PDF
This is a comment from the W3C i18n WG. http://www.unicode.org/reports/tr53/ When the spec is provided for review in PDF it isn't possible to - link to a specific section in the review report - copy the text into a report - search for text in the document when reviewing reported issues. Could we, in future, please provide HTML-based documents? (It's ok to use images for the examples that are unlikely to be rendered properly for all readers.)
Feedback above this line was reviewed in the January 2018 UTC meeting.