Accumulated Feedback on PRI #231

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Sun Jul 8 11:37:00 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: PRI 231 Bidi Parentheses Algorithm


Sorry I've not had time to look at this issue in depth (sick pet
comes 1rst; so these days does my online blog; sorry); my experience
at facebook suggests it's a good idea to have parentheses paired;
however a note of caution: doing so could affect "doodles" that make
use of punctuation characters such as ): [unhappy face] or (: [happy
face]. I do plan to look at the proposal further; there might be a
"work around" for the doodling issue.  --C. E. Whitehead

Date/Time: Wed Jul 11 12:04:05 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: PRI #231 (Bidi Parentheses Algorithm)


1rst, my apologies for mentioning (: :) :( ): these are not paired parens normally 
and would be ignored by the algorithm!!! That's great!
On the same note, I am not completely sure about { };
please see discussion of single curly braces in legal documents:
http://www.oooforum.org/forum/viewtopic.phtml?t=53089
(Obviously though in such cases this rule will just not be implemented so I guess 
such uses are no problem again!).
Thus go ahead with all four kinds of braces:
 (), [], <>, and {}.
(I am not sure about others).

2nd, 
** IMO, the algorithm should be part of Unicode's core, 
as Unicode core's previous way of handling braces should be improved/corrected in the core,
even though (IMO again) the current rules HL4 and HL5 are also fine,
and do enable applications to fix/tweak the basic bidi algorithm.
What I mean is that, since if an app ignores HL4 and HL5 and simply applies the Unicode 
bidi algorithm the result could be mismatched brackets (in some cases);further, the 
rules in the core should be fixed in the core (IMO again) so long as they are only 
fixed for marks that are universal, and not language specific.
** Also I agree it's best to locate the matching brackets before applying the rest 
of the bidi algorithm.

Third, 3.1. Some comments on the algorithm itself:
"If an open parenthesis is found, push it onto a stack and continue the scan. If 
a close parenthesis is 85 found, check if the stack is not empty and the close 
parenthesis is the other member of the mirrored pair for the character on the 
top of the stack. If so, pop the stack and continue the scan; else return failure. 
If the end of the paragraph is reached, return success if the stack is empty; 
else return failure. Success implies that all open and close parentheses, if 
any, in a paragraph are matched correctly. Failure implies that there are one 
or more mismatched paired punctuation marks in a run and therefore the 90 handling 
under the parenthesis algorithm will not be attempted."
** The above is fine, IMO
"The rationale for following the embedding level in the normal case is that 
the text segment enclosed by 120 the paired punctuation marks will conform 
to the progression of other text segments in the writing direction. In the 
exception cases, the rationale to follow the opposite direction is based on 
context being established between the enclosed and adjacent segments with 
the same direction."
** Agreed, yes, embedding level should be followed in normal case, albeit 
for both brackets.
"Other neutral types adjacent to paired punctuation marks are resolved subsequent 
to resolving the paired punctuation marks themselves, and will therefore be 
influenced by that resolution."
** Agreed again, yes, so far so good.
"The directionality of the enclosed content is opposite the embedding direction, 
and at least one 115 neighbor has a bidi level opposite to the embedding 
direction O(O)E, E(O)O, or O(O)O."
"*N0. Paired punctuation marks take the embedding direction if the enclosed 
text contains a strong type of the same direction. Else, if the enclosed text 
contains a strong type of the opposite direction and at least one external 
neighbor also has that direction the paired punctuation marks take the direction 
opposite the embedding direction."
** I disagree with the above statements. 

R(R)L à R -- that is, with embedding ltr -- o.k.
L(R)R à R -- that is, with embedding ltr -- No; these brackets should 
take the ltr directionality, that is should take the directionality of 
the embedding level if same directionality immediately precedes opening 
paren (IMO again but see examples below).

** Same problem in an rtl embedding environment:
L(L)R à L with embedding rtl -- O.k. the text that precedes the parens 
is ltr, as is the text in parens, so fine, let the directionality be 
different from that of embedding directionality.
R(L)L à L with embedding rtl -- No; same problem as above in the ltr 
embedding environment! The directionality of the embedding is the same as 
the directionality of the text immediately preceding the parentheses. I 
think this sets the reader's expectation for the display of the parens!

So for example (note: as is your convention, upper case letters designate 
RTL characters/text and lower case designate ltr):
TEXT: AS-SAYYAD AL-ALIFBAYT (w3c lead, balad1), abc (w3c lead, country2).
O.k., if the embedding directionality of this text is ltr, these parenthese 
can be displayed as ltr since there is enough ltr text both inside and outside of them.
However, if the embedding is rtl, the rtl text immediately preceding the 
parentheses makes me expect to see the parentheses displayed as rtl;
thus (sorry can't find an rtl comma on my keyboard):
I think in this case embedding directionality should determine the directionality 
of the whole maybe even of both parenthetical texts??
=> (country2 ,w3clead abc ,(balad1 ,w3clead) TYABFILA-LA DAYYAS-SA :TXET                                      
(** Thus, I think that, together with the directionality of the embedding level, 
the text that logically precedes the parens is critical to the determination of 
the directionality of the parens.)
3.2. Also, Sometimes there is no adjacent text on one side of a set of brackets, 
or on the other: although fortunately parentheses rarely begin a text block 
(except in programming, and in my writing), they often end a text block, followed 
by a neutral punctuation mark:
L(R)N
{ ? Have you addressed such cases in your algorithm? I must have missed something. }
* In an ltr text/embedding the above L(R)N should clearly be ltr. {And R(L)N 
should be rtl in an rtl text.} 
R(R)N 
* In an ltr text the above should run rtl nevertheless! {And L(L)N should run 
ltr in an rtl embedding!}
These two cases above seem to me to be two obvious cases!!
However, for the next two cases the solution is not so obvious:
L(R)N in an rtl context/embedding 
* (should it remain rtl? I am unsure. Probably.)
R(L)N in an ltr context/embedding
* (same question; should it remain ltr? Probably.)

The following I am more sure of:
 L(RL)N in an rtl context/embedding
* I would make this ltr
R(RL)N in an ltr context/embedding
* I would make this rtl

4. Examples (just to give you all an idea of some uses of bidi with parens)
(I'm not 100% sure but believe that
Facebook uses any rtl??? and not 1rst strong;
thus, all text at facebook with any rtl character is processed then as rtl --
or not?
perhaps though Facebook's algorithm breaks text into parenthes but any rtl 
in or before a parenthes changes embedding direction??
Oh well.
But I don't need to discuss a particular application/site.)
In any case, below are my examples; I am assuming that the embedding is rtl 
for all cases

Example A. Embedding rtl or ltr
Sound transcription in English (EXAMPLEORDEFINITIONINOTHERLANG) Definition 
in English
* I want parens displayed ltr, with a separate embedding level for content 
inside so that it can run rtl of course as it will
=> Sound transcription in English (GNALREHTONINOITINIFEDROELPMAXE) Definition 
in English
* you resolve  as ltr only if the embedding is ltr?

Example B. Embedding rtl but should be ltr
Text EXAMPLE1INOTHERLANGUAGE (Sound transcription in English, Definition in 
English), EXAMPLE2INOTHERLANGUAGE
** What I want:
=> Text: EGAUGNALREHTONI1ELPMAXE (Sound transcription in English, Definition 
in English), EGAUGNALREHTONI2ELPMAXE
** That is, I want parens displayed ltr again because of the opening ltr text.
* Adjacent text is all rtl
* Content of parens is strong ltr
*  you resolve the parens as ultimately -- it seems, if embedding is rtl then 
you resolve the parens as rtl? ;
rtl will not do for my content I guess but particularly not if the embedding 
is ltr (which I think it should be here with the ltr text beginning this) 
but in that case you do resolve the parens as ltr?? (I think I've read you 
right)!!! 
In any case, I believe I can resolve the above with a line break (if a neutral 
character at the end such as a line break has the correct effect) or a declaration 
of document directionality.

Example C. Embedding ltr or rtl
English sentence (EXAMPLEINOTHERLANGUAGE, Sound transcription in English) 
More text
** Again I want parens displayed ltr in spite of the EXAMPLEINOTHERLANGUAGE; 
however in my case but not in general if the parens run rtl the the phonetic 
transcription simply appears before the Arabic text and this won't really hurt me)
* Some text in rtl some ltr in parens
 * If embedding is ltr, then your algorithm makes parens come out ltr;  this 
case is resolved corrrectly immediately.

Best,
--C.E. Whitehead
cewcathar@hotmail.com

Date/Time: Wed Jul 18 13:52:15 CDT 2012
Contact: aharon@google.com
Name: Aharon Lanin
Report Type: Public Review Issue
Opt Subject: PRI #231 issue: proposed N0 violates X10 by working outside of a level run


This is a a comment on PRI #231 (the BPA). The proposal includes adding the
following rule to UAX #9:

*N0. Paired punctuation marks take the embedding direction if the enclosed
text contains a strong type of the same direction. Else, if the enclosed text
contains a strong type of the opposite direction and at least one external
neighbor also has that direction the paired punctuation marks take the
direction opposite the embedding direction. [...] This rule is applied to
those paired punctuation marks that are correctly nested and occur at the same
level without an intervening drop below their level."

This rule does not fit into the Unicode Bidirectional Algorithm as currently
formulated for a technical reason: rule X10 (which PRI #231 does not propose
to modify) states that "The remaining rules are applied to each run of
characters at the same level." The proposed N0 is one of those "remaining
rules", but attempts to apply to pairs of characters that may be separated by
a higher embedding level, and thus may be in two different level runs. This
violates X10.

Date/Time: Sun Jul 29 16:41:05 CDT 2012
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Public Review Issue
Opt Subject: PRI #231 (Bidi Parentheses Algorithm)


Ooops! I am past the closing date. I hope the comments
below will be considered nevertheless.

1) In the table of section 3.5, the last line of the second column R(LR)R
should be highlighted, since the UBA will resolve the open paren as LTR and
the close paren as RTL.

2) In the same table, the sixth line of the fourth column L(LR)L should be
highlighted, since the UBA will resolve the open paren as LTR and the close paren as RTL.

3) Same comment for the next line L(LR)R.

4) Line 115 mentions "the directionality of the enclosed content". It is not clear
what this directionality is when the content includes mixed LTR and RTL text.

5) For completeness, rule N0 should specify what happens when the enclosed text
is all N, even if to say that the BPA does not affect this case.

6) In section 5, example 6, I don't understand the result of the BPA. From the 
UBA display, I understand that the LTR text (Microsoft Corp) is at the logical 
end of the string. In the BPA display it appears on the starting end of the 
string. I see nothing in the definition of the BPA which should give such a result.

7) Is a solution to the current problem of mismatched parenthesis desirable?
I am not sure, because of the following reasons:
a.	The UBA is already quite complex. The BPA would add still more complexity. 
Proof is that, if my comments 1-3 and 6 above are founded, even the author of the 
proposal has missed some fine points. And if my comments are not founded, I am 
myself the one who got confused, despite the fact that I have more experience in 
bidi matters than the average person.
b.	Consider a text editor implementing the UBA and BPA by transforming the 
logical text to visual display after each keystroke. When entering an open paren,
 the BPA will not kick in, since it is not paired. When entering the closing paren, 
the BPA will kick in, possibly modifying the display of text around the opening
paren, which may be a few lines far from the typing location.
The UBA also has effects of modifying the appearance of text already entered, 
but it is always in the close neighborhood of the typing location.

8) Does the proposed solution meet expectations in terms of the naturalness of 
segmentation and directional flow of enclosed units?
The proposal assumes that opposite direction content within parentheses forms 
a unique directional run with opposite direction text on either side. When one 
of the sides has the embedding direction, I don't see that the context has opposite 
direction rather than embedded direction. In doubt, the BPA should not assume 
opposite direction for the parentheses.
Here is an example. The text in logical order (with upper case representing RTL 
letters) is 
   "I LIVE IN paris (france)."
Assuming a RTL paragraph direction, the UBA will display 
   ".(paris (france NI EVIL I"
The BPA will display
   ".paris (france) NI EVIL I"
which is better. However, since the general direction of the text is RTL, I prefer 
to have it displayed as 
   ".(france) paris NI EVIL I"
To get  this result, rule N0 can be reformulated as follows:
N0. Paired punctuation marks take the opposite direction if the enclosed text contains 
no strong type of the embedding direction and the external neighbors on both sides 
have the opposite direction. Else the paired punctuation marks take the embedding direction. 

9) Should the BPA be implemented as a new rule affecting the resolution of neutral 
types in the core UBA . proposed rule N0? 
Should the BPA, rather, be a recommended implementation using higher level protocols?
As said above, I am not sure that the BPA has more benefits than problems, but if it 
is adopted, I think it should be in the core UBA. Leaving it for a higher level 
protocol introduces one more degree of uncertainty in the behavior of the presentation 
system, and this is not something we need.

10) Are stability concerns adequately addressed?
I think that in most cases, reasonable bidi text which looks good with the UBA will 
look the same with the BPA. However here is an exception. The logical text is (where ] 
represents RLE and ^ represents PDF):
   I LIVE IN ]paris^(france).
The UBA will display it as:
   .(france)paris^] NI EVIL I
The BPA, according to the proposed N0, will assign the opposite direction to the 
parentheses, thus displaying:
   .paris^(france)] NI EVIL I
You can see that the order of "paris" and "france" is reversed. That would not happen 
with the revised N0 that I suggested in comment 8 above.

11) Are interoperability concerns during the migration period adequately addressed?
The document states: " The main stability concern therefore is that text authored using 
the BPA may display differently when rendered on a system which has not implemented the 
BPA. In such a case, the reader of that text is no worse off than they would have been 
prior to the development of the BPA."
This is not quite correct. Without the development of the BPA, the document author 
would have taken measures (like adding control characters) to create a correct display 
under the UBA. Authors writing on BPA-supporting systems are likely to create text 
which will be rendered differently on BPA-ignorant systems.
However, this is a problem which will always surface when new features are introduced.