This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Sun Jul 8 11:37:00 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: PRI 231 Bidi Parentheses Algorithm
Sorry I've not had time to look at this issue in depth (sick pet comes 1rst; so these days does my online blog; sorry); my experience at facebook suggests it's a good idea to have parentheses paired; however a note of caution: doing so could affect "doodles" that make use of punctuation characters such as ): [unhappy face] or (: [happy face]. I do plan to look at the proposal further; there might be a "work around" for the doodling issue. --C. E. Whitehead
Date/Time: Wed Jul 11 12:04:05 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: PRI #231 (Bidi Parentheses Algorithm)
1rst, my apologies for mentioning (: :) :( ): these are not paired parens normally and would be ignored by the algorithm!!! That's great! On the same note, I am not completely sure about { }; please see discussion of single curly braces in legal documents: http://www.oooforum.org/forum/viewtopic.phtml?t=53089 (Obviously though in such cases this rule will just not be implemented so I guess such uses are no problem again!). Thus go ahead with all four kinds of braces: (), [], <>, and {}. (I am not sure about others). 2nd, ** IMO, the algorithm should be part of Unicode's core, as Unicode core's previous way of handling braces should be improved/corrected in the core, even though (IMO again) the current rules HL4 and HL5 are also fine, and do enable applications to fix/tweak the basic bidi algorithm. What I mean is that, since if an app ignores HL4 and HL5 and simply applies the Unicode bidi algorithm the result could be mismatched brackets (in some cases);further, the rules in the core should be fixed in the core (IMO again) so long as they are only fixed for marks that are universal, and not language specific. ** Also I agree it's best to locate the matching brackets before applying the rest of the bidi algorithm. Third, 3.1. Some comments on the algorithm itself: "If an open parenthesis is found, push it onto a stack and continue the scan. If a close parenthesis is 85 found, check if the stack is not empty and the close parenthesis is the other member of the mirrored pair for the character on the top of the stack. If so, pop the stack and continue the scan; else return failure. If the end of the paragraph is reached, return success if the stack is empty; else return failure. Success implies that all open and close parentheses, if any, in a paragraph are matched correctly. Failure implies that there are one or more mismatched paired punctuation marks in a run and therefore the 90 handling under the parenthesis algorithm will not be attempted." ** The above is fine, IMO "The rationale for following the embedding level in the normal case is that the text segment enclosed by 120 the paired punctuation marks will conform to the progression of other text segments in the writing direction. In the exception cases, the rationale to follow the opposite direction is based on context being established between the enclosed and adjacent segments with the same direction." ** Agreed, yes, embedding level should be followed in normal case, albeit for both brackets. "Other neutral types adjacent to paired punctuation marks are resolved subsequent to resolving the paired punctuation marks themselves, and will therefore be influenced by that resolution." ** Agreed again, yes, so far so good. "The directionality of the enclosed content is opposite the embedding direction, and at least one 115 neighbor has a bidi level opposite to the embedding direction O(O)E, E(O)O, or O(O)O." "*N0. Paired punctuation marks take the embedding direction if the enclosed text contains a strong type of the same direction. Else, if the enclosed text contains a strong type of the opposite direction and at least one external neighbor also has that direction the paired punctuation marks take the direction opposite the embedding direction." ** I disagree with the above statements. R(R)L à R -- that is, with embedding ltr -- o.k. L(R)R à R -- that is, with embedding ltr -- No; these brackets should take the ltr directionality, that is should take the directionality of the embedding level if same directionality immediately precedes opening paren (IMO again but see examples below). ** Same problem in an rtl embedding environment: L(L)R à L with embedding rtl -- O.k. the text that precedes the parens is ltr, as is the text in parens, so fine, let the directionality be different from that of embedding directionality. R(L)L à L with embedding rtl -- No; same problem as above in the ltr embedding environment! The directionality of the embedding is the same as the directionality of the text immediately preceding the parentheses. I think this sets the reader's expectation for the display of the parens! So for example (note: as is your convention, upper case letters designate RTL characters/text and lower case designate ltr): TEXT: AS-SAYYAD AL-ALIFBAYT (w3c lead, balad1), abc (w3c lead, country2). O.k., if the embedding directionality of this text is ltr, these parenthese can be displayed as ltr since there is enough ltr text both inside and outside of them. However, if the embedding is rtl, the rtl text immediately preceding the parentheses makes me expect to see the parentheses displayed as rtl; thus (sorry can't find an rtl comma on my keyboard): I think in this case embedding directionality should determine the directionality of the whole maybe even of both parenthetical texts?? => (country2 ,w3clead abc ,(balad1 ,w3clead) TYABFILA-LA DAYYAS-SA :TXET (** Thus, I think that, together with the directionality of the embedding level, the text that logically precedes the parens is critical to the determination of the directionality of the parens.) 3.2. Also, Sometimes there is no adjacent text on one side of a set of brackets, or on the other: although fortunately parentheses rarely begin a text block (except in programming, and in my writing), they often end a text block, followed by a neutral punctuation mark: L(R)N { ? Have you addressed such cases in your algorithm? I must have missed something. } * In an ltr text/embedding the above L(R)N should clearly be ltr. {And R(L)N should be rtl in an rtl text.} R(R)N * In an ltr text the above should run rtl nevertheless! {And L(L)N should run ltr in an rtl embedding!} These two cases above seem to me to be two obvious cases!! However, for the next two cases the solution is not so obvious: L(R)N in an rtl context/embedding * (should it remain rtl? I am unsure. Probably.) R(L)N in an ltr context/embedding * (same question; should it remain ltr? Probably.) The following I am more sure of: L(RL)N in an rtl context/embedding * I would make this ltr R(RL)N in an ltr context/embedding * I would make this rtl 4. Examples (just to give you all an idea of some uses of bidi with parens) (I'm not 100% sure but believe that Facebook uses any rtl??? and not 1rst strong; thus, all text at facebook with any rtl character is processed then as rtl -- or not? perhaps though Facebook's algorithm breaks text into parenthes but any rtl in or before a parenthes changes embedding direction?? Oh well. But I don't need to discuss a particular application/site.) In any case, below are my examples; I am assuming that the embedding is rtl for all cases Example A. Embedding rtl or ltr Sound transcription in English (EXAMPLEORDEFINITIONINOTHERLANG) Definition in English * I want parens displayed ltr, with a separate embedding level for content inside so that it can run rtl of course as it will => Sound transcription in English (GNALREHTONINOITINIFEDROELPMAXE) Definition in English * you resolve as ltr only if the embedding is ltr? Example B. Embedding rtl but should be ltr Text EXAMPLE1INOTHERLANGUAGE (Sound transcription in English, Definition in English), EXAMPLE2INOTHERLANGUAGE ** What I want: => Text: EGAUGNALREHTONI1ELPMAXE (Sound transcription in English, Definition in English), EGAUGNALREHTONI2ELPMAXE ** That is, I want parens displayed ltr again because of the opening ltr text. * Adjacent text is all rtl * Content of parens is strong ltr * you resolve the parens as ultimately -- it seems, if embedding is rtl then you resolve the parens as rtl? ; rtl will not do for my content I guess but particularly not if the embedding is ltr (which I think it should be here with the ltr text beginning this) but in that case you do resolve the parens as ltr?? (I think I've read you right)!!! In any case, I believe I can resolve the above with a line break (if a neutral character at the end such as a line break has the correct effect) or a declaration of document directionality. Example C. Embedding ltr or rtl English sentence (EXAMPLEINOTHERLANGUAGE, Sound transcription in English) More text ** Again I want parens displayed ltr in spite of the EXAMPLEINOTHERLANGUAGE; however in my case but not in general if the parens run rtl the the phonetic transcription simply appears before the Arabic text and this won't really hurt me) * Some text in rtl some ltr in parens * If embedding is ltr, then your algorithm makes parens come out ltr; this case is resolved corrrectly immediately. Best, --C.E. Whitehead cewcathar@hotmail.com
Date/Time: Wed Jul 18 13:52:15 CDT 2012
Contact: aharon@google.com
Name: Aharon Lanin
Report Type: Public Review Issue
Opt Subject: PRI #231 issue: proposed N0 violates X10 by working outside of a level run
This is a a comment on PRI #231 (the BPA). The proposal includes adding the following rule to UAX #9: *N0. Paired punctuation marks take the embedding direction if the enclosed text contains a strong type of the same direction. Else, if the enclosed text contains a strong type of the opposite direction and at least one external neighbor also has that direction the paired punctuation marks take the direction opposite the embedding direction. [...] This rule is applied to those paired punctuation marks that are correctly nested and occur at the same level without an intervening drop below their level." This rule does not fit into the Unicode Bidirectional Algorithm as currently formulated for a technical reason: rule X10 (which PRI #231 does not propose to modify) states that "The remaining rules are applied to each run of characters at the same level." The proposed N0 is one of those "remaining rules", but attempts to apply to pairs of characters that may be separated by a higher embedding level, and thus may be in two different level runs. This violates X10.
Date/Time: Sun Jul 29 16:41:05 CDT 2012
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Public Review Issue
Opt Subject: PRI #231 (Bidi Parentheses Algorithm)
Ooops! I am past the closing date. I hope the comments below will be considered nevertheless. 1) In the table of section 3.5, the last line of the second column R(LR)R should be highlighted, since the UBA will resolve the open paren as LTR and the close paren as RTL. 2) In the same table, the sixth line of the fourth column L(LR)L should be highlighted, since the UBA will resolve the open paren as LTR and the close paren as RTL. 3) Same comment for the next line L(LR)R. 4) Line 115 mentions "the directionality of the enclosed content". It is not clear what this directionality is when the content includes mixed LTR and RTL text. 5) For completeness, rule N0 should specify what happens when the enclosed text is all N, even if to say that the BPA does not affect this case. 6) In section 5, example 6, I don't understand the result of the BPA. From the UBA display, I understand that the LTR text (Microsoft Corp) is at the logical end of the string. In the BPA display it appears on the starting end of the string. I see nothing in the definition of the BPA which should give such a result. 7) Is a solution to the current problem of mismatched parenthesis desirable? I am not sure, because of the following reasons: a. The UBA is already quite complex. The BPA would add still more complexity. Proof is that, if my comments 1-3 and 6 above are founded, even the author of the proposal has missed some fine points. And if my comments are not founded, I am myself the one who got confused, despite the fact that I have more experience in bidi matters than the average person. b. Consider a text editor implementing the UBA and BPA by transforming the logical text to visual display after each keystroke. When entering an open paren, the BPA will not kick in, since it is not paired. When entering the closing paren, the BPA will kick in, possibly modifying the display of text around the opening paren, which may be a few lines far from the typing location. The UBA also has effects of modifying the appearance of text already entered, but it is always in the close neighborhood of the typing location. 8) Does the proposed solution meet expectations in terms of the naturalness of segmentation and directional flow of enclosed units? The proposal assumes that opposite direction content within parentheses forms a unique directional run with opposite direction text on either side. When one of the sides has the embedding direction, I don't see that the context has opposite direction rather than embedded direction. In doubt, the BPA should not assume opposite direction for the parentheses. Here is an example. The text in logical order (with upper case representing RTL letters) is "I LIVE IN paris (france)." Assuming a RTL paragraph direction, the UBA will display ".(paris (france NI EVIL I" The BPA will display ".paris (france) NI EVIL I" which is better. However, since the general direction of the text is RTL, I prefer to have it displayed as ".(france) paris NI EVIL I" To get this result, rule N0 can be reformulated as follows: N0. Paired punctuation marks take the opposite direction if the enclosed text contains no strong type of the embedding direction and the external neighbors on both sides have the opposite direction. Else the paired punctuation marks take the embedding direction. 9) Should the BPA be implemented as a new rule affecting the resolution of neutral types in the core UBA . proposed rule N0? Should the BPA, rather, be a recommended implementation using higher level protocols? As said above, I am not sure that the BPA has more benefits than problems, but if it is adopted, I think it should be in the core UBA. Leaving it for a higher level protocol introduces one more degree of uncertainty in the behavior of the presentation system, and this is not something we need. 10) Are stability concerns adequately addressed? I think that in most cases, reasonable bidi text which looks good with the UBA will look the same with the BPA. However here is an exception. The logical text is (where ] represents RLE and ^ represents PDF): I LIVE IN ]paris^(france). The UBA will display it as: .(france)paris^] NI EVIL I The BPA, according to the proposed N0, will assign the opposite direction to the parentheses, thus displaying: .paris^(france)] NI EVIL I You can see that the order of "paris" and "france" is reversed. That would not happen with the revised N0 that I suggested in comment 8 above. 11) Are interoperability concerns during the migration period adequately addressed? The document states: " The main stability concern therefore is that text authored using the BPA may display differently when rendered on a system which has not implemented the BPA. In such a case, the reader of that text is no worse off than they would have been prior to the development of the BPA." This is not quite correct. Without the development of the BPA, the document author would have taken measures (like adding control characters) to create a correct display under the UBA. Authors writing on BPA-supporting systems are likely to create text which will be rendered differently on BPA-ignorant systems. However, this is a problem which will always surface when new features are introduced.