This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback. [Ed: This is the final feedback compilation, after the PRI closed.]
Date/Time: Sat Oct 20 15:27:04 CDT 2012
Contact: eliz@gnu.org
Name: Eli Zaretskii
Report Type: Public Review Issue
Opt Subject: PRI232, UAX#9, tr9-28
The proposed text is not entirely clear on which rules are applied "to each isolating run sequence". From the text in 3.4, one can infer that only W1 through I2 are to be applied to each isolating sequence, whereas L1 through L4 are applied to a complete line, but it would be better to say that explicitly in X10 instead of using the vague and inaccurate "the remaining rules". Another unclear part is how to proceed the level resolution from one isolating sequence to another. Should the rules be applied to the outer-most isolating sequence first, then to the next outer-most, etc. all the way to the inner-most sequence? Should they be applied left to right? Should they be applied in the order of embedding levels? Or does it even matter? N1 says "start-of-level-run (sos)" and the same for eos, which is probably a mistake. Finally, the examples of reordering use none of the new bidirectional controls and do not show the effect of these controls on reordered text. It would be nice if they did, because that would allow the reader to make sure he/she understood the rules, by simulating the algorithm run and comparing the results with the provided ones. Using the new contyrols in examples will also allow to show the differences between applying the rules "one level run at a time" in the previous versions of the algorithm and "one isolating sequence at a time" in the new version.
Date/Time: Thu Nov 1 15:54:44 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: Proposed Update UAX 9
2.4 Explicit Directional Isolates; par 2 (Proofreading Nit sort of) "In addition to allowing embedding text whose direction is the opposite of its surroundings without unduly affecting its surroundings, one of the isolate codes also offers an extra feature: embedding text while inferring its direction heuristically from its constituent characters. { COMMENT: I found the above a little vague [maybe it is me, but I found it vague]; first, I do not like the gerunds "embedding," "inferring;" [however, had you said "text of an embedding direction opposite to that of surrounding text," it would have been o.k. in my opinion as "embedding direction" is an explicitly defined keyword in this document]; also I do not like the last pronoun "its" -- which I infer correctly I am sure as referring to the directionality of the "text;" however in my rewrite I replaced "its" with "that text" so there would be no question; finally the word "unduly" is pretty vague and may need a note; on this see also my reply to Eli Zaretskii below. } => "In addition to allowing embedded text whose direction is opposite that of surrounding text but does not unduly[?NOTE] affect surrounding text, one of these isolate code also offers an extra feature: embedded text whose direction is inferred heuristically from that text's constituent characters." 3.3.2 { Comment I do not like the rule numbering here -- >particularly the numbers X5a and X5b, but that made inserting new numbers easier; someone I suppose needs to do a search and replace otherwise ; likewise I suppose that the section, "Terminating Isolates," precedes "Terminating Embeddings and Overrides" so you can use the numbers 6a, etc; this is fine but not to my taste; wish I could fix this sometime, though I'm busy with my own concerns } > >Accumulated Feedback on PRI #232 > >This page is a compilation of formal public feedback > >received so far. See Feedback for further information on > >this issue, how to discuss it, and how to provide feedback. > >Date/Time: Sat Oct 20 15:27:04 CDT 2012 > >Contact: eliz@gnu.org > >Name: Eli Zaretskii > >Report Type: Public Review Issue > >Opt Subject: PRI232, UAX#9, tr9-28 > >. . . > >Another unclear part is how to proceed the level resolution > >from one > >isolating sequence to another. Should the rules be applied > >to the > >outer-most isolating sequence first, then to the next > >outer-most, etc. > >all the way to the inner-most sequence? Should they be > >applied left > >to right? Should they be applied in the order of embedding > >levels? > >Or does it even matter? { MY COMMENT: I do not think it matters but I do need a definition of "unduly" in section 2.4; see above. } > >N1 says "start-of-level-run (sos)" and the same for eos, > >which is > >probably a mistake. { MY COMMENT: I did not think so; maybe I am confused. } Best, --C. E. Whitehead cewcathar@hotmail.com
Date/Time: Fri Nov 2 13:00:12 CDT 2012
Contact: behdad@google.com
Name: Behdad Esfahbod
Report Type: Public Review Issue
Opt Subject: Feedback re PRI232 (bidi)
Since we are making such drastic changes to bidi, I suggest we also bump up the 61 limit. I suggest either not specify a limit, or something like "at least 253" kind of wording. One of my concerns is that if, for example, a web browser ends up using isolates or embedding characters when converting a div to text copied to clipboard, then the deeply nested div structures of today's web sites will make it feasible to reach the current 61 limit in a realistic use case. Not a huge deal, but given the computing resources of this decade, it's just free to bump it up at least.
Date/Time: Fri Nov 2 15:17:53 CDT 2012
Contact: asmus@uniode.org
Name: Asmus
Report Type: Public Review Issue
Opt Subject: PRI 232 feedback
I see in the draft that the following is contemplated: X_Bidi_Class I think this is problematic. As the list of properties grows, being able to sort related properties next to each other in any list becomes more important. If an extended property needs to be marked with a "X" that should be at the end, not the front of the property name.
Date/Time: Mon Nov 5 14:44:13 CST 2012
Contact: aharon@google.com
Name: Aharon Lanin
Report Type: Public Review Issue
Opt Subject: Feedback on PRI 232
Some comments on the BPA-related modifications to UAX#9:
1. The very beginning of section 3, which deals with the phases of the algorithm, contains the following passage (emphases mine):
Initialization. A *list of bidirectional character types is initialized*, with one entry for each character in the original text. The value of each entry is the X_Bidi_Class property of the respective character. *After this point, the original characters are no longer referenced until the reordering phase.* A list of embedding levels, with one level per character, is then initialized.
Resolution of the embedding levels. *A series of rules are applied to the lists of embedding levels and bidirectional character types. Each rule is based on the current values of those lists*, and can modify those values. Each rule is applied to each of the values in sequence before continuing to the next rule. The result of this phase is a modified list of embedding levels; the list of bidirectional character types is no longer needed.
I believe that these paragraphs need to be updated. First, they have to mention that the algorithm now looks at character properties besides the bidirectional type, namely (I think) general category, Bidi_Mirrored, and Bidi_Mirroring_Glyph.
Furthermore, the statement that "the original characters are no longer referenced until the reordering phase" at face value now appears to be false, since for punctuation marks A and B to be considered paired, "Bidi_Mirroring_Glyph of A is B" has to hold, and at least naively this seems to reference B, which is the original character. It is possible that referencing the original character for this purpose can be avoided by providing a "cooked" version of Bidi_Mirroring_Glyph, which instead of being Bidi_Mirroring_Glyph[c] is just c for the characters with gc = Pe.
But if we start to talk about cooking, it is also natural to mention that all three values can (I think) be combined into one using the following logic:
!Bidi_Mirrored[c] || Bidi_Mirroring_Glyph[c] == 0 || !(gc[c] == Ps || gc[c] == Pe) ? 0 : gc[c] == Pe ? c : -Bidi_Mirroring_Glyph[c]
The idea is that for paired punctuation, the cooked value is non-zero, where the cooked value of the opening character is negative, and the cooked value of the closing character is the same as that of the opening character, but negated (i.e. positive). If this indeed works, it may be worthwhile to mention it somewhere, perhaps a separate section, since passing four arrays for each character as inputs to the algorithm where a single one used to be sufficient is a significant complication.
2. The current changes have moved the phrase "European and Arabic numbers act as if they were R in terms of their influence on neutrals" from N1 to the introductory paragraph of 3.3.4. I think that it is important to leave this phrase explicitly in N1 and to add it to N0, since it is all too easy to overlook it in the introductory paragraph.
3. I think that the definition of "paired punctuation marks", currently a non-italicized (and thus non-normative?!) afterthought in N0, is sufficiently important to warrant being moved into a separate named definition in in 3.1, e.g. BD1a. IMO, it has to include not just the "mirrored pair" requirement, but also the "correctly nested" requirement.
4. I do not understand the exact meaning of a key paragraph (currently in N1, but IMO belonging in the definition of "paired punctuation marks"), namely:
This rule is applied to paired punctuation marks that are correctly nested. When paired punctuation marks are mismatched, pairing occurs between the closest pairable marks in logical order.
I have multiple problems here:
- "correctly nested" is not rigorously defined.
- I have no idea what is meant by "mismatched".
- I may be able to guess what "pairable" means, but am not sure.
- I do not understand how the second sentence fits with the first.
I think that an algorithm has to be spelled out that, given a sequence of characters, identifies the paired punctuation marks (all of which N0 will then force to the same direction).
5. Is the intent of the paragraph cited in 4 above to set the direction of some pairs of characters in a sequence even though not all the mirrored characters in the sequence form correctly nested pairs? If so, this seems to be a very significant departure from previous specifications of the BPA that increases the risk of applying the BPA to characters that were not intended to be pairs.
6. Extra word "section" in list of modifications: "Changes to section Section 3.3.4".
7. The modification bullet item for the BPA should point out that the algorithm now looks at additional character properties. Perhaps something along the lines of "Extension of the algorithm to resolve paired punctuation marks to the same direction. Adds BD1a and N0, using character properties such as the general category, Bidi_Mirrored and Bidi_Mirroring_Glyph to detect paired punctuation."
Date/Time: Tue Nov 6 14:38:25 CST 2012
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Public Review Issue
Opt Subject: Comments on PRI #232
The following comments are mostly editorial. They do not question the essence of the substantial changes introduced in this version of the UBA. 1. BD9 (isolating run sequence): I find this definition very confusing. It is based on level runs (defined in BD7) but seems to only consider level changes caused by RLI/LRI/FSI/PDI. The definition says that every level run (except the first and last) must start with PDI and end with RLI/LRI/FSI. However the example shows level runs like text3, text5 (and in fact all the runs except those in the first row of cells) which do not satisfy this condition. The examples after X10 add to my confusion. 2. Related to item 1: the definition of sos and eos refers to isolating run sequences. It seems to me that sos and eos should be defined for level runs, not sequences. 3. In X2, X3, X4, X5, X5a, X5b, we find the phrase "If this new level would be valid". Valid in this case means "less or equal to 61". I suggest to use this more explicit condition. 4. In X2, X3, X4, X5, X5a, X5b, new values are assigned to current embedding level, current isolate status and current override status. I suggest to use the verb "set" to specify this change of values (and not "reset" as in the current phrasing, which hints of going back to a previous value). "Reset" should be preferred when describing the action of PDF and PDI on the above variables. 5. In X8, change "pagraph" to "paragraph". 6. The examples after N0 mention N1 and N2, which come afterwards. I suggest to move the examples after N2. 7. In Example 1 after N0, the resolved level of the first space should be 1 (and not 2).
Date/Time: Thu Dec 13 17:39:27 CST 2012
Contact: mgrzegor@poczta.onet.pl
Name: Marcin Grzegorczyk
Report Type: Public Review Issue
Opt Subject: Feedback on PRI #232
In addition to Aharon Lanin’s comments, I would like to point out that the term “external neighbor” in the proposed rule N0 is ambiguous without a definition. It could mean either the adjacent character, or the nearest strong type; and in both cases sos/eos may be included or not.
Also, I am not happy about the idea of having the UBA refer to properties not directly related to bidi (namely, the General Category property). In fact, since the proposed update already adds new bidi classes for isolates, it might add new bidi classes for paired punctuation as well. I believe it would not only allow for more flexibility (e.g. if a need arises to include characters of a different General Category), but also enable expressing rule N0 more clearly.
Below is my list of proposed changes (relative to UAX #9 rev. 28 draft 4) based on this idea.
------
Add two new values to X_Bidi_Class (or Bidi_Class_X as per Asmus’s suggestion): Opening_Punctuation (OP) and Closing_Punctuation (CP).
Assign OP to all characters with General_Category=Open_Punctuation for which Bidi_Mirroring_Glyph is not <none>.
Assign CP to all characters with General_Category=Close_Punctuation for which Bidi_Mirroring_Glyph is not <none>.
In Tables 3 and 4, add the new two classes to the Neutral category.
[Note: I believe that as of 6.2.0 all characters with gc=Ps or gc=Pe have bc=ON.]
Add a new definition:
BD11. Character A forms a mirrored pair with character B if the property Bidi_Mirrored is Yes for both A and B, and Bidi_Mirroring_Glyph of A is B.
Rephrase N0 as follows:
N0. Search backward from each instance of a closing punctuation (CP) until either the first opening punctuation (OP) or sos is found. If an OP is found, and it does not form a mirrored pair with the CP character, change that OP and all OPs preceding it in the isolating run sequence to Other Neutral (ON). [1] If an OP is found, and it forms a mirrored pair with the CP character, then:
- If the text between the OP and the CP contains at least one non-neutral type [2] (L, R, EN or AN) of the same direction as the embedding direction [3], change both the OP and the CP to the strong type (L or R) corresponding to the embedding direction.
- Otherwise, if the text between the OP and the CP contains at least one non-neutral type of the direction opposite to the embedding direction, and at least one of the following conditions is true:
then change both the OP and the CP to the strong type opposite to the embedding direction.
- the last non-neutral type, if any, preceding the OP [4] is also of the direction opposite to the embedding direction,
- the first non-neutral type, if any, following the CP is also of the direction opposite to the embedding direction,
- Otherwise, change both the OP and the CP to ON. [5]
Notes:
[1] This means that, if there is any mismatched pair of punctuation marks, the rule will be applied neither to that pair, nor to any enclosing pair. From Aharon Lanin’s comment #5 I understand that to be the original intent of the BPA, the current (ambiguous) wording notwithstanding. If a more complicated algorithm is desired, it would have to be spelled out here.
[2] I prefer “non-neutral” to “strong” here, to remind the reader that EN and AN also have to be taken into account (other weak types and AL having been resolved already).
[3] A check for mixed types seems to be redundant; if there are mixed-direction types, then at least one is in the embedding direction. (This is based on my reading of the current draft; if “mixed strong types” was intended to include mix of e.g. R and AN, then this condition would have to read “… more than one non-neutral type (L, R, EN or AN), or at least one non-neutral type of the same direction as the embedding direction”.)
[4] This is based on the way I understand what “external neighbor” was intended to mean. The wording “if any” indicates that sos/eos are not included (if they are, then every character in an isolating run sequence is preceded and followed by some strong type).
[5] This covers the case when the enclosed text does not contain any strong character; changing both marks to ON prevents mis-pairing the OP with a later CP. Note that the CP does not actually have to be changed to ON, as it makes no difference to further applications of rule N0 or to rules N1 and N2. (However, if a more complicated pairing algorithm is specified, it may become important to change both OP and CP here.)
Note also that the new bidi classes may create additional ‘legacy’ classes of conforming systems (see chapter 4.2), namely those that use Bidi_Class instead of X_Bidi_Class (and thus effectively ignore rule N0).
Date/Time: Sat Dec 22 09:08:55 CST 2012
Contact: verdy_p@wanadoo.fr
Name: Philippe Verdy
Report Type: Public Review Issue
Opt Subject: UAX#9 (UBA) PRI 3.3.4 Resolving Neutral Types and stability
The introduction of isolates in the UBA is a major improvement to the UBA which has the interesting impact of making it possible to define a stability for the bidrectional resolution. However stability is still not warrantied for the added rule N0 (resolution of neutral types) notably for paired mirrored punctuation. All other rules are using stabilized character properties except there: (Citation) "Paired punctuation marks are pairs of characters A and B, where A has "general category Open_Punctuation (gc=Ps), B has general category "Close_Punctuation (gc=Pe), and A and B form a mirrored pair "(Bidi_Mirrored=Yes for both, and Bidi_Mirroring_Glyph of A is B)." (End of Citation) The general categories (Open_Punctuation and Close_Punctuation) are stabilized. However, - the mirrored property (Bidi_Mirrored) is not stabilized (it should be now); - the mirrored pairs mappings (Bidi_Mirroring_Glyph) are not stabilized (it should be now). This will largely improve the performance of the BiDi algorithm implementation for long term, avoiding also future quirks by using the mechanism offered by isolates: - If there remains any other problem where pairing is expected but will not occur (and will not change due to the stability rule), the use of isolated (FSI/PDI) around the content but within the mirrored pair will resolve these problems. So instead of: <[:Ps:], content, [:Pe:]> we'll still be able to encode: <[:Ps:], FSI, content, PDI, [:Pe:]> - If there remains any other problem where pairing is not wanted but still occurs (and will not change due to the stability rule), it will stil remain possible to isolate each punctuation within its own FSI/PDI isolate: So instead of: <[:Ps:], content, [:Pe:]> we'll still be able to encode: <FSI, [:Ps:], PDI, content, FSI, [:Pe:], PDI> or even (for more quirky cases): <RLI, [:Ps:], PDI, content, LRI, [:Ps:], PDI> What would be the additional stability rules for paired punctuations used in rule N0 ? (1) If a character is not mirrored (Bidi_Mirrored=No) then it MUST NOT be mapped with a non-default paired character for mirroring (Bidi_Mirroring_Glyph=undefined) (2) If a charact has NO defined mapping to a paired character for mirroring (Bidi_Mirroring_Glyph=undefined), then it must NOT be mirrored (Bidi_Mirrored=No) (3) If a character is mirrored (Bidi_Mirrored=Yes) then it MUST be mapped with a non-default paired character for mirroring (Bidi_Mirroring_Glyph=code point). (4) If a character is mapped to a non-default paired character for mirroring (Bidi_Mirroring_Glyph=code point), then the first character MUST be mirrored (Bidi_Mirrored=Yes). (5) If a character A is mapped to a character B for mirroring (Bidi_Mirroring_Glyph=code point B), the character A and B must be distinct and NOT canonically equivalent to A: NFC(A) != NFC(B) (6) If a character A is mapped to a character B for mirroring (Bidi_Mirroring_Glyph=code point B), then B must be mapped to a character C which is canonically equivalent to A, i.e.: NFC(C) = NFC(A) (7) If a character A is mapped to a character B for mirroring, and A is a starting punctuation (gc=Ps), then B is an ending punctuation (gc=Pe); (8) If a character A is mapped to a character B for mirroring, and A is an ending punctuation (gc=Pe), then B is a starting punctuation (gc=Ps). Note that rules (5) and (6) are stronger than what is given in the citation above for the definition of N0. They imply that A and B are paired (like in the citation) but also force these punctuations to be canonically non- equivalent. But this does not force these characters to be themselves in NFC form: we could still have compatibility punctuations added which will be mirrored like another existing "standard" punctuation, for example for some standardized glyph variants, and pairing will still remain possible with another existing standard punctuation, or with another newer compatibility punctuation. An hypothetic example would be the standardization of italic and oblique parentheses (which are only differentiated by their forward or backward angle) : pairs could be <italic-Ps, italic-Pe>, or <italic-Ps, oblique-Pe>, or <oblique-Ps, oblique-Pe>, or <oblique-Ps, italic-Pe> such that the oblique variants would still be canonically equivalent to italic variants (differentiated more explicitly only by using a variant selector, without knowing then which one is mirrored with the other). But even if this tolerance is kept, the UBA algorithm will still find an appropriate pair, by matching them in the encoded text, simply under their expected transformation to NFC or NFD.
Date/Time: Tue Jan 8 10:23:20 CST 2013
Contact: roozbeh@google.com
Name: Roozbeh Pournader
Report Type: Public Review Issue
Opt Subject: Bug in definition of end-of-sequences (eos) in UAX#9
The definition of end-of-sequence (eos) in the latest draft of UAX #9 (28-6) is problematic for isolating run sequences ending in an isolate initiator (the result of missing closing PDIs). The current definition can result in leaking the content of the isolate to the outside. Assume the sequence R ON RLI R, which is a left-to-right paragraph. After X9, the levels are going to be 0 0 0 1. The first isolating run sequence is going to be R ON RLI, and the second the final R. The problem is the definition of eos for the first isolating run sequence. Since there is no PDI to close the sequence, the next character, used in definition of eos, is the R which has level 1, so the eos becomes R, instead of L, which it would have been if there was a closing PDI. This would result in the ON and the RLI becoming right-to-left, while if there was a closing PDI, they would have become left-to-right. We need to change the language to have an exception like this: For the definition of eos, if an isolating run sequence ends with an isolate initiator, it will be assumed that there is no character following it in the paragraph.
Date/Time: Thu Jan 17 14:02:18 CST 2013
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: Proposed Update UAX #9, Unicode Bidirectional Algorithm
Hi, I am commenting on Marcin Grzegorczyk's comments here; I also have one comment on Phillipe Verdy's comments, which follow Marcin's. (Marcin's and Phillipe's comments are in black; mine are in purple; quotations from unicode documents are not colored but are set off. Same with my example email.)
Date/Time: Thu Dec 13 17:39:27 CST 2012
Contact: mgrzegor@poczta.onet.pl
Name: Marcin Grzegorczyk
Report Type: Public Review Issue
Opt Subject: Feedback on PRI #232
> In addition to Aharon Lanin?s comments, I would like to point out that the term ?external neighbor? in the proposed rule N0 is ambiguous
without a definition. It could mean either the adjacent character, or the nearest strong type; and in both cases sos/eos may be included or not.
> Also, I am not happy about the idea of having the UBA refer to properties not directly related to bidi (namely, the General Category property). In fact, since the proposed update already adds new bidi classes for isolates, it might add new bidi classes for paired punctuation as well. I believe it would not only allow for more flexibility (e.g. if a need arises to include characters of a different General Category), but also enable expressing rule N0 more clearly.
>
> Below is my list of proposed changes (relative to UAX #9 rev. 28 draft 4) based on this idea.
>
>------
>
> Add two new values to X_Bidi_Class (or Bidi_Class_X as per Asmus?s suggestion): Opening_Punctuation (OP) and Closing_Punctuation (CP).
> Assign OP to all characters with General_Category=Open_Punctuation for which Bidi_Mirroring_Glyph is not <none>.
I like this idea, which has been discussed previously.
> Assign CP to all characters with General_Category=Close_Punctuation for which Bidi_Mirroring_Glyph is not <none>.
> In Tables 3 and 4, add the new two classes to the Neutral category.
O.k. so far.
> [Note: I believe that as of 6.2.0 all characters with gc=Ps or gc=Pe have bc=ON.]
This is my understanding, too.
>
> Add a new definition:
> BD11. Character A forms a mirrored pair with character B if the property Bidi_Mirrored is Yes for both A and B, and Bidi_Mirroring_Glyph of A is B.
> Rephrase N0 as follows:
> N0. Search backward from each instance of a closing punctuation (CP) until either the first opening punctuation (OP) or sos is found. If an OP
> is found, and it does not form a mirrored pair with the CP character, change that OP and all OPs preceding it in the isolating run sequence to
> Other Neutral (ON). [1] If an OP is found, and it forms a mirrored pair with the CP character, then:
> If the text between the OP and the CP contains at least one non-neutral type [2] (L, R, EN or AN) of the same direction as the embedding
> direction [3], change both the OP and the CP to the strong type (L or R) corresponding to the embedding direction.
> Otherwise, if the text between the OP and the CP contains at least one non-neutral type of the direction opposite to the embedding
> direction,
> and at least one of the following conditions is true:
> the last non-neutral type, if any, preceding the OP [4] is also of the direction opposite to the embedding direction,
> the first non-neutral type, if any, following the CP is also of the direction opposite to the embedding direction,
> then change both the OP and the CP to the strong type opposite to the embedding direction.
> Otherwise, change both the OP and the CP to ON. [5]
I do think mirrored characters need to be addressed in UAX 9, and so far they are,
" Paired punctuation marks are considered as a pair so that they both resolve to the same direction."
(http://www.unicode.org/reports/tr9/tr9-28.html#Resolving_Neutral_Types)
but I am not completely in agreement with Marcin's algorithm above.
The original algorithm discussed for mirrored pairs (which I like; this algorithm may be found at: http://www.unicode.org/review/pri231/pri231-background.pdf) was, as I understand things (I am quoting here):
"Once the paired punctuation marks have been identified, they should be resolved to the embedding direction except in the following cases which are resolved, based on context, opposite the embedding direction:
"* The directionality of the enclosed content is opposite the embedding direction, and at least one 115 neighbor has a bidi level opposite to the embedding direction O(O)E, E(O)O, or O(O)O.
"*The enclosed content is neutral and both neighbors have a bidi level opposite to the embedding direction O(N)O. Resolving to opposite to the embedding direction is current behavior under the UBA (N1)."
Here the algorithm is again, expressed as a rule:
"*N0. Paired punctuation marks take the embedding direction if the enclosed text contains a strong type of the same direction. Else, if the enclosed text contains a strong type of the opposite direction and at least one external neighbor also has that direction the paired punctuation marks take the direction opposite the embedding direction."
This rule amounts to if any text matches the embedding direction, since "if," "then" is applied in sequence. This is fine IMO. (And, otherwise, if all text inside the mirrored punctuation is neutral I suppose the embedding direction should be taken, I would suppose, not a neutral direction, based on the algorithm given at the url above, which, as I've said, I like.)
However, as far as the the bidi parentheses algorithm goes, what about the following symbols formed from various punctuation marks?
(-: , :-)
Would I treat the text between the two happy faces as neutral opening and closing text? These sequences should be somehow excepted, I think.
The above text is a comma separating a happy and a sad face which will all work as neutrals probably.I believe that these characters would be treated as the following sequence (a "/" separates each character): ON/ES/CS/CS/WS/ON/ES/CS/ ). That is, these are all weak or neutrals. So this case might pose no problem.
I suppose we have to resolve the "ES" and "CS" characters first though, which then are resolved to other neutrals so all we have are neutrals, which take the embedding direction, and now of course the parentheses are interpreted as such.
But what about the following text (set off from my comments by asterisks)?
* * *
Salam my friend! KAYFA HALUK? ANAA LHAMDU ULLAA (-: some problems though making my emails work with this new algorithm so ANAA LASTU SA'IYDUN )-: any suggestions?
* * *
Although I would tend to support exempting the happy face sequence from the parentheses algorithm, the happy faces here enclose parenthetical text.
According to the rules Marcin has suggested, but not really to those of the parentheses algorithm, the above "enclosed text" would be treated as RTL and thus some ordering would be reversed though I've not traced it through. Your algorithm treats this as RTL since an R character immediately precedes the parenthetical comment and since there are some R (strong RTL) characters within the parenthetical comment.
The levels are: 0s for the L text
then 1 for text KAYFA HALUK ANAA LHAMDU ULLA
Then we find a mirrored piece of punctuation, and then a bit later a close parentheses (now we have to pop the stack back to the previous, and we find a match, and so have opening and closing punctuation). Whatever algorithm we use for display, I hope these two faces, if they are to be treated as mirrored at all, will display as left-to-right.
One question: what level will the text in parentheses/happy faces be: 1s and 0s still? (or 3s and 2s?) (Sorry for asking this, but would it work better to treat the text inside the mirrored punctuation as a new embedding level? (I'm not a developer but may try to think through this sometime. I don't see how it will improve things to treat this text as a new embedding level)
> Notes:
> [1] This means that, if there is any mismatched pair of punctuation marks, the rule will be applied neither to that pair, nor to any enclosing pair.
> From Aharon Lanin?s comment #5 I understand that to be the original intent of the BPA, the current (ambiguous) wording notwithstanding. If a
> more complicated algorithm is desired, it would have to be spelled out here.
> [2] I prefer ?non-neutral? to ?strong? here, to remind the reader that EN and AN also have to be taken into account (other weak types and AL
> having been resolved already).
> [3] A check for mixed types seems to be redundant; if there are mixed-direction types, then at least one is in the embedding direction. (This is
> based on my reading of the current draft; if ?mixed strong types? was intended to include mix of e.g. R and AN, then this condition would have
> to read ?? more than one non-neutral type (L, R, EN or AN), or at least one non-neutral type of the same direction as the embedding direction?.)
> [4] This is based on the way I understand what ?external neighbor? was intended to mean. The wording ?if any? indicates that sos/eos are not
> included (if they are, then every character in an isolating run sequence is preceded and followed by some strong type).
> [5] This covers the case when the enclosed text does not contain any strong character; changing both marks to ON prevents mis-pairing the OP with a later CP. Note that the CP does not actually have to be changed to ON, as it makes no difference to further applications of rule N0 or to rules N1 and N2. (However, if a more complicated pairing algorithm is specified, it may become important to change both OP and CP here.)
> Note also that the new bidi classes may create additional ?legacy? classes of conforming systems (see chapter 4.2), namely those that use Bidi_Class instead of X_Bidi_Class (and thus effectively ignore rule N0).
One more comment from me at this point: I tend to agree with one of Phillipe's comments:
Date/Time: Sat Dec 22 09:08:55 CST 2012
Contact: verdy_p@wanadoo.fr
Name: Philippe Verdy
Report Type: Public Review Issue
Opt Subject: UAX#9 (UBA) PRI 3.3.4 Resolving Neutral Types and stability
> "(5) If a character A is mapped to a character B for mirroring (Bidi_Mirroring_Glyph=code point B), the character A and B must be distinct and NOT canonically equivalent to A: NFC(A) != NFC(B)"
(I may also agree with the comment that follows, [6], which I am sorry; I need to think through.)
Best,
--C. E. Whitehead
cewcathar@hotmail.com
Date/Time: Tue Jan 22 03:22:31 CST 2013
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Other Question, Problem, or Feedback
Opt Subject: Comments on UAX #9 proposed version 28
These are my comments on the document at http://www.unicode.org/reports/tr9/tr9-28.html
1) A nit: in section "2 Directional Formatting Characters", in sentence starting with "Even more significantly, the effect that an isolate as a whole", the words "Even more significantly" are not appropriate and should be removed. This sentence is just an explanation of the previous sentence, it is not more significant.
2) In section "2.1 Explicit Directional Embeddings", the sentence "The effect of right-left line direction, for example, can be accomplished by embedding the text with RLE...PDF." mentions PDF although its description comes later. I suggest to add a note in parentheses such as: (PDF will be described in section "2.3 Terminating Explicit Directional Embeddings and Overrides") and making this an HTML link. If this is too heavy, maybe only make the word PDF a link to section 2.3.
3) In sections 2.3 and 2.5, the descriptions of PDF and PDI mention "bidirectional state" 4 times. This term is nowhere explained. Depending on what is included in this bidirectional state, it is likely that PDI really restores the bidirectional state, but it is not so clear that PDF does the same. If it did, there would be no difference between embeddings and isolates. Besides the quest for correct phrasing, I think that detailing what the bidirectional state includes is important for implementers.
4) In section "2.4 Explicit Directional Isolates", we find: "In addition to allowing embedding text whose direction is the opposite of its surroundings...". This suggests that an isolate specifying the same direction as the surroundings makes no difference, which is not true (e.g. if the isolate ends with an opposite-direction word and is followed by a number). I suggest instead "In addition to allowing embedding text whose direction is explicitly specified".
5) In the text following BD7, we find (speaking of directional marks) "The level of the neutral characters can also be changed by inserting appropriate directional marks around neutral characters. These marks have no other effects.".
a. I think that "directional formatting characters" would be more appropriate than "directional marks", since marks tend to be interpreted as LRM or RLM, while the intent here is to also include LRE/RLE/PDF etc...
b. I am not sure that it is correct to write that these marks (or formatting characters) have no other effects. Don't they separate Arabic letters as shaping is concerned? This would not make a difference for space though.
6) In BD13, the second paragraph starts with "Equivalently". This is not appropriate since this paragraph starts an explanation completely different from the previous paragraph. I suggest to simply remove this word.
7) In BD13, for extreme clarification, I suggest to replace the sentence
"In the absence of isolate initiators, each isolating run sequence in a paragraph contains exactly one level run."
by
"In the absence of isolate initiators, each isolating run sequence in a paragraph contains exactly one level run and each level run constitutes a separate isolating run sequence."
7 bis) BD13, isolating run sequences: I have a feeling that implementing the algorithm described in the document for computing isolating run sequences would be quite detrimental to performance. On the other hand, I think that there are reasonable chances that the behavior of isolating run sequences can be obtained by adding relevant information to the bidirectional stack. For instance, the stack could remember if and how many neutrals precede an isolate initiator: 0 means that the last strong char before the initiator has the embedding direction; 1 means that the last strong char is of the opposite direction but there are no intervening neutrals; a number n greater than 1 means that the last strong char is of the opposite direction and there are n - 1 intervening neutrals. If the next non-neutral char after the matching isolate terminator is a digit or an opposite direction char and n was >= 1, then the neutrals before the isolate initiator must have their leveled increased by 1). This is very much implementation-oriented, but maybe it would be appropriate to hint to implementers that there may be more efficient ways to achieve the same result.
8) In Table 3, the links for FSI, RLI and PDI are broken.
9) In section "3.2 Bidirectional Character Types", the sentence "This invariant will be maintained in the future." should be "This invariance will be maintained in the future.".
10) In section "3.3.2 Explicit Levels and Directions", we find "A directional status stack of at most max_depth+1 entries where each entry consists of:".
a. I would rather see "at least" than "at most".
b. Implementers may want to include more items in each entry, thus maybe replace "consists of" by "includes".
11) In the paragraph starting "A counter called the overflow isolate count", remove the extraneous "to" in "to to determine".
12) In the sentence starting with "Note that there is no need", "nothing it all" should be "nothing at all".
13) In the text for X1, I think that the last bullet (starting with "Process each character") should not be an item of X1 or within a section named "Initialization". This sentence is not a step of the initialization but the body of the processing. I suggest to create a new subtitle labeled "Processing".
14) The algorithms in X2 and X3 assume that max_depth is an odd number. BD2 specifies max_depth as 61 and the Review Note there mentions the possibility of an "implementation-defined odd value no smaller than 61". However I suggest to add a note in X2 reminding that max_depth is an odd number.
15) In X6a, we find "Given that the valid isolate count is non-zero, the directional status stack must contain an entry with directional isolate status true before the loop". The expression "before the loop" is not clear. It should be something like "before the entries popped in the loop, if any". I suggest to simply remove the words "before the loop".
16) In X10's paragraph starting with "Apply rules W1-W7", instead of "When applying a rule to an isolating run sequence, the last value of each level run in the isolating run sequence is treated as if it were immediately followed by the first value in the next level run in the sequence, if any." I would prefer "When applying a rule to an isolating run sequence, the last character of each level run in the isolating run sequence is treated as if it were immediately followed by the first character in the next level run in the sequence, if any.".
17) In "3.3.4 Resolving Neutral Types" first paragraph, the links to Table 3 and Table 4 are broken.
18) in "3.3.4 Resolving Neutral Types", examples 1 and 2 mention rules N1 and N2 while these rules appear later. It would be better to postpone the examples until after all the Nx rules.
19) Change the definition of N0 from
"N0. Paired punctuation marks take the embedding direction if the enclosed text contains mixed strong types or a strong type of the embedding direction only. Else, if the enclosed text contains a strong type of the opposite direction only, and at least one external neighbor also has that direction, the paired punctuation marks take the direction opposite the embedding direction."
to (hopefully equivalent but more straightforward)
"N0. Paired punctuation marks take the embedding direction if the enclosed text contains at least one strong type of the embedding direction. Else, if the enclosed text contains at least one strong type of the opposite direction, and at least one external neighbor also has that direction, the paired punctuation marks take the direction opposite the embedding direction."
20) In Example 1 following N0, the resolved level of the first WS should be 1 (NOT 2).
21) The text of N0 mentions "external neighbor". The term neighbor is not clear. Does it mean directly adjacent or does it allow some neutrals between the punctuation mark and the opposite direction neighbor? Should there be a difference between "ARABIC book(s)" and "ARABIC book (s)" (with an added space before the left parenthesis)? The current writing of the rules implies that there will be no pairing for the latter string. Since N0 is applied before N1, the space after "book" is still at the embedding level, so that there is no neighbor of the opposite direction. I am not sure if that was the intent, anyway this point - what is a neighbor - should be clarified.
22) If I understand correctly the rules N0 to N2,
a. in string "smith (fabrikam ARABIC) HEBREW", there will be pairing of parentheses.
b. in string "smith (<LRE>fabrikam<PDF> ARABIC) HEBREW", there will be no pairing since the 2 parentheses belong to different isolating run sequences.
c. in string "smith (<LRI>fabrikam<PDI> ARABIC) HEBREW", there will be pairing because the 2 parentheses belong to the same isolating run sequence.
I find this quite confusing.
SUGGESTION: N0 will act on level runs (and not isolating run sequences as currently specified).
Rationale:
- cases b and c will behave similarly (no automatic pairing).
- if the author is smart enough to use embeddings (which justifies not pairing automatically in case b), he/she is no less smart in case c, so that he/she will take care of the level of parentheses if needed.
- simplified implementation of pairing.
23) It is not clear to me what constitutes correct nesting of punctuation marks.
In the examples below, the spaces between phrases are there for visual clarity only).
Example 1: text1 ( text2 [ text3 ] text4 ) text5
Example 2: text1 ( text2 ( text3 ) text4 ) text5
Example 3: text1 ( text2 [ text3 ) text4 ] text5
Example 4: text1 ( text2 [ text3 ) text4
Which pairs are considered correctly nested?
24) The text after N0 defines paired punctuation marks as having general category Ps or Pe and forming a mirrored pair.
> From a quick research on UCD, I have found 116 such characters forming 58 pairs.
> From looking at the list, I find that all the characters are utterly irrelevant in plain bidi text, except for the regular parentheses, square brackets and curly brackets (all in the ASCII range).
Missing are the less-than and greater-than signs. One important use case for pairing them is presenting XML or HTML source code where tags and attributes are English, attribute values may be anything, and the text between tags may also be of any direction.
SUGGESTION: the pairing characters will only be the 4 pairs < > ( ) [ ] { } in the ASCII range.
25) In HL1, the link to X5c is broken.
26) In the text of HL3, we find: "The behavior must always be defined by reference to what would happen if the equivalent explicit directional formatting characters as defined in the algorithm were inserted into the text. For example, a style sheet or markup can set the embedding level on a span of text." The phrasing is not appropriate. In fact, explicit directional formatting characters do affect the embedding level but they cannot *set* it. I suggest to replace "set" by "modify".
27) I propose to add a new Higher Level Protocol option, as follows:
HL7. Change the value of max_depth
(The explaining text below is adapted from the Review Note appearing after BD2)
Web applications may be constructed with various element types defined to be directional isolates, and it is not unreasonable for an application to do so to ensure proper display. If these elements are nested and generated programmatically, it is possible or even likely that they will exceed the default 61 level depth limit. Such applications should be allowed to use an even higher depth limit if they so wish.
Note: changing max_depth to a *lower* value is also convenient when testing implementations behavior on overflow cases.
28) In section "6.3 Formatting", in the last paragraph we find "(unless, of course, the insert's direction is known)". This text can be removed since the first sentence announces that "the direction of the text to be programmatically inserted is not known".
29) About the review note at the end of this section: I think that this is not the place to add more examples. In a normative document like this one, the role of the examples is to clarify the intent, not to justify it.
Date/Time: Fri Jan 25 17:19:50 CST 2013
Contact: mgrzegor@poczta.onet.pl
Name: Marcin Grzegorczyk
Report Type: Error Report
Opt Subject: Re: Proposed Update UAX #9, Unicode Bidirectional Algorithm
[I realize it is well past the deadline, but C. E. Whitehead suggested I may submit this anyway.]
This is a slightly edited copy of my reply to C. E. Whitehead:
[...] > > But what about the following text (set off from my comments by asterisks)? > > > > * * * > > > > Salam my friend! KAYFA HALUK? ANAA LHAMDU ULLAA (-: some problems > > though making my emails work with this new algorithm so ANAA LASTU > > SA'IYDUN )-: any suggestions? > > > > * * * > > > > Although I would tend to support exempting the happy face sequence from > > the parentheses algorithm, the happy faces here enclose parenthetical text. > > > > According to the rules Marcin has suggested, but not really to those of > > the parentheses algorithm, the above "enclosed text" would be treated as > > RTL and thus some ordering would be reversed though I've not traced it > > through. Your algorithm treats this as RTL since an R character > > immediately precedes the parenthetical comment and since there are some > > R (strong RTL) characters within the parenthetical comment.
That is not the case. The text within the parentheses includes both strong LTR and RTL types, so by the first bullet in my proposal of rule N0, the parentheses would take the embedding direction. In the absence of a higher-level protocol, that would be L, and the display would be as it should be.
Date/Time: Sun Jan 27 14:44:33 CST 2013
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Other Question, Problem, or Feedback
Opt Subject: Proposed Update Unicode Standard Annex #9
Sorry this is past the deadline; after discussing this with others, I've changed my comments slightly on how rule N0 should be phrased. > > Date: Sat, 26 Jan 2013 00:39:16 +0100 > > From: mgrzegor@poczta.onet.pl > > To: cewcathar@hotmail.com >> > > http://www.unicode.org/reports/tr9/tr9-28.html#Resolving_Neutral_Types: >> > > /N0. Paired punctuation marks take the embedding direction if the >> > > enclosed text contains mixed strong types or a strong type of the >> > > embedding direction only. Else, if the enclosed text contains a strong >> > > type of the opposite direction only, and at least one external neighbor >> > > also has that direction, the paired punctuation marks take the direction >> > > opposite the embedding direction./ {Marcin wrote} > > . . . as I pointed out in my original comment, it is actually redundant to > > say "mixed strong types or a strong type of the embedding direction > > only": assuming that "mixed strong types" means "strong types of mixed > > direction" (which seems to have been the intent), it can be rephrased as > > "at least one strong type of the embedding direction" (as there are only > > 2 directions possible). I like Marcin's rephrasing better. Also I'd prefer something like => "Else, if there are no strong types of the embedding direction, but at least one strong type of the opposite direction . . . " (for the "else" part of the rule). > > IMO, "paired punctuation marks" are well-defined in the draft: "Paired punctuation marks are pairs of characters A and B, where A has general category Open_Punctuation (gc = Ps), B has general category Close_Punctuation (gc = Pe), and A and B form a mirrored pair (Bidi_Mirrored = Yes for both, and Bidi_Mirroring_Glyph of A is B)" "External neighbor" however could be perhaps better define as text immediately preceding the "Open_Punctuation" category character of the pair or immediately following the "Close-Punctuation" category of the pair; there is an interesting thing here because now we might have some LRO or RLO here but these should have already been handled by X4 and X5. Best, --C. E. Whitehead cewcathar@hotmail.com
Date/Time: Mon Jan 28 10:17:30 CST 2013
Contact: aharon@google.com
Name: Aharon Lanin
Report Type: Public Review Issue
Opt Subject: PRI 232
In his feedback, Mati writes: 24) [...] Missing are the less-than and greater-than signs. One important use case for pairing them is presenting XML or HTML source code where tags and attributes are English, attribute values may be anything, and the text between tags may also be of any direction. I agree. Another important use case is email addresses like "John Doe <john@doe.com>", which in RTL comes out with the angle brackets mismatched. While it is true that when used as less than and greater than signs in math expressions, pairing these characters is inappropriate, I think that it would be hard to come up with examples where not only is a less than sign (used as such) followed by a greater than sign, but applying the BPA to them would actually change the display order.
From: "Matitiahu Allouche"
Subject: Comments on UAX #9 version 28 not related to the current PRIs
Date: Fri, 1 Feb 2013 01:58:00 +0200
These are comments on the document at http://www.unicode.org/reports/tr9/tr9-28.html which are not related to the PRIs about isolation or BPA. a) In section "2.1 Explicit Directional Embeddings", the sentence "The effect of right-left line direction, for example, can be accomplished by embedding the text with RLE...PDF." mentions PDF although its description comes later. I suggest to add a note in parentheses such as: (PDF will be described in section "2.3 Terminating Explicit Directional Embeddings and Overrides") and making this an HTML link. If this is too heavy, maybe only make the word PDF a link to section 2.3. b) In section "3.2 Bidirectional Character Types", the sentence "This invariant will be maintained in the future." should be "This invariance will be maintained in the future.". c) In the text for X1, I think that the last bullet (starting with "Process each character") should not be an item of X1 or within a section named "Initialization". This sentence is not a step of the initialization but the body of the processing. I suggest to create a new subtitle labeled "Processing". Shalom (Regards), Mati
Date/Time: Tue Mar 5 03:28:24 CST 2013
Contact: aharon@google.com
Name: Aharon Lanin
Report Type: Public Review Issue
Opt Subject: PRI #232
Some feedback on UAX#9 revision 28 draft 10. 1. At the very beginning of section 3, Basic Display Algorithm, it is made clear that the algorithm references the characters' bidirectional types, and the bullet entitled "Resolution of the embedding levels" states that "The original characters are referenced in the application of certain rules." That is fine. What's missing is a mention early on that the algorithm also references the characters' Bidi_Paired_Bracket_Type and Bidi_Paired_Bracket properties. (Currently, they are first mentioned only in the middle of section 3.1, Definitions.) This omission can be easily fixed by replacing the sentence "The original characters are referenced in the application of certain rules" with "The original characters and their Bidi_Paired_Bracket_Type and Bidi_Paired_Bracket properties are referenced in the application of certain rules." 2. Pointing out that only characters with bidi class ON ever have Bidi_Paired_Bracket_Type values Open or Close would allow a significant optimization (only looking up the Bidi_Paired_Bracket_Type for ON characters). If this restriction has not been made explicit in the definition of Bidi_Paired_Bracket_Type, it should be added there, and noted in UAX#9.
Date/Time: Tue Apr 23 23:13:25 CDT 2013
Contact: corporate@khwilliamson.com
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: BidiBrackets.txt
It would be easier for implementers of TUS if a single uniform format were adopted, and all new data files conformed to it. And, that format should require a minimum of effort to add to implementations. The format of BidiBrackets.txt, for example, requires one to teach the implementation that column 2 is one property and column 3 is another. That is extra work that could be avoided if the new files came in a format that didn't require it. An existing file with such a syntax is DerivedCoreProperties.txt. That format could easily be adapted for non-binary properties, and many other formats are possible. But my point is that you should publish the files in some such format to make it easier on implementers. We are stuck with the format of already-published files, but we can do better for future files. Similarly, the now machine-readable @missings lines are inconsistent. In BidiBrackets.txt it is # @missing: 0000..10FFFF; <none>; n Compare that to an @missings line in PropertyValueAliases.txt # @missing: 0000..10FFFF; Bidi_Mirroring_Glyph; <none> There are three columns in each, but the meanings of column 2 are inconsistent. One is a property name, and one is a property value. And the third column in one gives the default value for the property in column 2. The other gives the default for an unnamed property that has to be taught to the implementation. If new @missings lines followed the syntax from PropertyValueAliases.txt, no teaching would be necessary. In BidiBrackets.txt, there would be two such lines, one for each property. My implementation already deals with the possibility of multiple @missings lines per file, as several existing files have them.
Date/Time: Sun Apr 28 15:03:47 CDT 2013
Contact: mgrzegor@poczta.onet.pl
Name: Marcin Grzegorczyk
Report Type: Public Review Issue
Opt Subject: Some more feedback on PRI #232
Some late but hopefully useful feedback on the latest draft (#10) of UAX #9 rev. 28. 1. Editorial remark: In the definition BD16, the terms “opening paired bracket”, “closing paired bracket” and “isolating run sequence” are italicized every time they appear in the text (in other definitions, terms are italicized only when they are defined). Same for the headings of the examples under the rule N0. 2. I agree with Aharon Lanin that it should be made clear that all characters with Bidi_Paired_Bracket_Type values Open or Close have bidi class ON (the note at the end of rule N0, bullet d implies that, but it should be mentioned explicitly); in fact, I think it ought to be a Unicode Stability Policy. 3. It might be worth mentioning (in the Implementation Notes section, perhaps) that Rule N0 and the associated definition BD16 can be implemented without actually creating a stack or list that BD16 calls for; such an implementation would be slower, but could require less memory, which can be important for embedded systems with limited RAM. One way to implement BD16 with minimum memory requirement might be as follows: * For each character with Bidi_Paired_Bracket_Type other than None, assign a status, one of: unresolved (initial value), resolved as paired, resolved as unpaired. Note that if such characters are guaranteed to have bc=ON, the Bidi_Paired_Bracket_Type property and the status can be encoded by creating additional, ‘virtual’ bidi classes (which would behave as ON for all the other purposes). * For each unresolved closing bracket, search backward until either sos or an unresolved opening bracket that forms a bracket pair with the closing bracket is found. In the latter case, resolve both brackets as paired, and if there are any unresolved opening brackets enclosed within the pair, resolve them all as unpaired. [Note: This corresponds to the 5 steps listed in BD16.] * Once the previous step is complete, for each opening bracket resolved as paired, the matching closing bracket can be found by the following algorithm: Initialize a counter to 1. Scan forward the isolating run sequence, incrementing the counter for each opening bracket resolved as paired, and decrementing it for each closing bracket resolved as paired; the matching closing bracket is the first one that causes the counter to be decremented to 0. (This would work because bracket pairs, as defined by BD16, may be nested, but cannot otherwise overlap.) * Note that closing brackets do not have to be resolved as unpaired; as long as each is checked only once, those that are not resolved as paired can be left in the unresolved state.
Date/Time: Tue Apr 30 13:32:09 CDT 2013
Contact: khw@cpan.org
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: 6.3 BidiBrackets.txt
This is an addendum to my earlier comments on this. The bottom line of what I was trying to say is that going forward, each new data file should be in a form that doesn't require manual intervention to specify to an implementor. This could be because the format of the file has each line contain only values for a single property, and includes that property name; or there could be machine-readable comments that describe the format of each entry, so that the file becomes self-describing. Currently, one has to know the file's format in order to interpret the @missings (supposedly) machine-readable line in this file. In the past, I've coped with this by using the @missings lines in PropertyValueAliases.txt, but there is no @missings entry there for Bidi_Paired_Bracket_Type. I presume that is an oversight that will be fixed before final publication. But, I believe all @missings lines should look like those in PropertyValueAliases.txt, with each containing full information, and not depending on the format of the file they are contained in.
Date/Time: Mon May 6 14:32:21 CDT 2013
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: Unicode 6.3 bidi brackets not ready?
The Unicode 6.3 bidi algorithm is expanded to handle "paired brackets". Formally, I think this is incomplete because it is hard to verify that an implementation is correct. In particular, draft UAX #9 says "The other test file, BidiCharacterTest.txt, contains test sequences of explicit code points, including, for example, paired brackets." -- but no such test file has been provided so far (and the beta period is over). Also, AFAIK there is no reference implementation that handles both isolates and paired brackets. BidiTest-6.3.0d3.txt is based on code that handles only the isolates. Also, AFAIK the "reference" so far has been "the Windows implementation", but Roozbeh says that Microsoft has three different implementations (in Windows, IE, and Office) with different behaviors. If this was a library API, we would label it "draft" or "technology preview". Given that there is no such mechanism for preliminary status in a UAX, it seems premature to include "paired brackets" in the Bidi algorithm.
Date/Time: Tue Mar 26 17:55:45 CDT 2013
Contact: kent.karlsson14@telia.com
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: 249 Unicode 6.3 Beta Review
Regarding BidiBrackets.txt: 0029; 0028; c # RIGHT PARENTHESIS Note: This one is fairly often used unpaired in list item numbering, like "1)". 005B; 005D; o # LEFT SQUARE BRACKET 005D; 005B; c # RIGHT SQUARE BRACKET The (open) interval notations ]a,b[, ]a,b], [a,b[ breaks this pairing, so does the (open) interval notations (a,b], [a,b). --------------- Not sure why these were picked for bidi-brackets... 2E22; 2E23; o # TOP LEFT HALF BRACKET 2E23; 2E22; c # TOP RIGHT HALF BRACKET 2E24; 2E25; o # BOTTOM LEFT HALF BRACKET 2E25; 2E24; c # BOTTOM RIGHT HALF BRACKET Indeed, I'm not sure they even pair up in any way. I think you instead meant to include these (paired up): @ Quine corners @+ These form a set of four quine corners, for quincuncial arrangement. They are also used in upper and lower pairs in mathematic[s], or more rarely in editorial usage as alternatives to half brackets. 231C TOP LEFT CORNER x (right angle substitution marker - 2E00) x (top left half bracket - 2E22) 231D TOP RIGHT CORNER 231E BOTTOM LEFT CORNER 231F BOTTOM RIGHT CORNER ------------------- And these four characters should be included as well (paired up): @ Ceilings and floors @+ These characters are tall and narrow mathematical delimiters, in contrast to the quine corners or half brackets. They are also distinct from CJK corner brackets, which are wide quotation marks. 2308 LEFT CEILING = APL upstile x (top left half bracket - 2E22) x (left corner bracket - 300C) 2309 RIGHT CEILING x (combining annuity symbol - 20E7) x (top right half bracket - 2E23) 230A LEFT FLOOR = APL downstile x (bottom left half bracket - 2E24) 230B RIGHT FLOOR x (right corner bracket - 300D) x (bottom right half bracket - 2E25) ------------
Date/Time: Fri Apr 26 18:11:18 CDT 2013
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: Unicode 6.3 new property Bidi_Paired_Bracket is redundant
Unicode 6.3 adds the BidiBrackets.txt file with the new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type. Bidi_Paired_Bracket seems redundant because it is trivially derivable as follows, and it seems nonsensical if this were not true at any time: bpb(c) = if(bpt(c)==None) then <none> else bmg(c) I suggest dropping Bidi_Paired_Bracket as a separate property. Alternatively, provide the derivation above as an immutable invariant. In unicore list discussion 2013apr25..26, there was feedback that this might not be true if some "ornate parens" get bpb mappings while they still do not have bmg mappings. In that case, such characters should get bmg mappings as well. It should be allowed for a character to have a bmg mapping even if it is not Bidi_Mirrored. Alternatively, if it is important enough for an existing character to change from bpt=None to a different value, then it should be allowed to gain the Bidi_Mirrored property. Asmus made further requests to formally change both bpt and bpb into derived properties, and to maintain them as such, so that new characters with relevant combinations of other properties are automatically given bpt/bpb values, unless explicitly overridden -- to avoid forgetting to update these newer properties.
Date/Time: Wed May 8 04:02:15 CDT 2013
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on first Review Note in UAX #9 draft 12 Revision 28
I don’t care if the level depth limit is left at 61 (which is enough IMHO) or raised to 127 or any other number, but the limit must be defined and not left to the implementer’s whim. This is why we have a standard. This is what will ensure that all agents present the same display.
Date/Time: Wed May 8 04:05:51 CDT 2013
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on second Review Note in UAX #9 draft 12 Revision 28
I favor resolving paired brackets only based on preceding context. Reasoning: a. I am not quite convinced about the benefits of BPA in any case. b. When not sure that a change is needed, do not change. c. In most cases, text in parentheses comes as a complement to what precedes it, not as announcement to what follows. The example “(s)he” is good, but I cannot see any other convincing use case.
Date/Time: Fri May 10 15:33:32 CDT 2013
Contact: mgrzegor@poczta.onet.pl
Name: Marcin Grzegorczyk
Report Type: Public Review Issue
Opt Subject: Re: Feedback on second Review Note in UAX #9 draft 12 Revision 28
I believe it would be better to resolve paired brackets opposite to the embedding direction based on context symmetrically (i.e., as it is specified in draft #10), on the following grounds: 1. In my opinion, the argument “when not sure that a change is needed, do not change” does not apply here: there already is a change with respect to previous versions of the UBA – brackets that form a pair will resolve to the same direction, the question now is, which would be more often appropriate. 2. While it may be true that in most cases, text in parentheses comes as a complement to what precedes it, it is not necessarily the case for other paired brackets. For instance, e-mail handling software often prefixes subject fields with a bracketed tag, as in “[SPAM] PRIVATE MESSAGE”.