Public Review Issues

Accumulated Feedback on PRI #231

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Sun Jul 8 11:37:00 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: PRI 231 Bidi Parentheses Algorithm


Sorry I've not had time to look at this issue in depth (sick pet
comes 1rst; so these days does my online blog; sorry); my experience
at facebook suggests it's a good idea to have parentheses paired;
however a note of caution: doing so could affect "doodles" that make
use of punctuation characters such as ): [unhappy face] or (: [happy
face]. I do plan to look at the proposal further; there might be a
"work around" for the doodling issue.  --C. E. Whitehead

Date/Time: Wed Jul 11 12:04:05 CDT 2012
Contact: cewcathar@hotmail.com
Name: C. E. Whitehead
Report Type: Public Review Issue
Opt Subject: PRI #231 (Bidi Parentheses Algorithm)

1rst, my apologies for mentioning (: :) :( ): these are not paired parens normally
and would be ignored by the algorithm!!! That's great!
On the same note, I am not completely sure about { };
please see discussion of single curly braces in legal documents:
http://www.oooforum.org/forum/viewtopic.phtml?t=53089
(Obviously though in such cases this rule will just not be implemented so I guess
such uses are no problem again!).
Thus go ahead with all four kinds of braces:
(), [], <>, and {}.
(I am not sure about others).

2nd,
** IMO, the algorithm should be part of Unicode's core,
as Unicode core's previous way of handling braces should be improved/corrected in the core,
even though (IMO again) the current rules HL4 and HL5 are also fine,
and do enable applications to fix/tweak the basic bidi algorithm.
What I mean is that, since if an app ignores HL4 and HL5 and simply applies the Unicode
bidi algorithm the result could be mismatched brackets (in some cases);further, the
rules in the core should be fixed in the core (IMO again) so long as they are only
fixed for marks that are universal, and not language specific.
** Also I agree it's best to locate the matching brackets before applying the rest
of the bidi algorithm.

Third, 3.1. Some comments on the algorithm itself:
"If an open parenthesis is found, push it onto a stack and continue the scan. If
a close parenthesis is 85 found, check if the stack is not empty and the close
parenthesis is the other member of the mirrored pair for the character on the
top of the stack. If so, pop the stack and continue the scan; else return failure.
If the end of the paragraph is reached, return success if the stack is empty;
else return failure. Success implies that all open and close parentheses, if
any, in a paragraph are matched correctly. Failure implies that there are one
or more mismatched paired punctuation marks in a run and therefore the 90 handling
under the parenthesis algorithm will not be attempted."
** The above is fine, IMO
"The rationale for following the embedding level in the normal case is that
the text segment enclosed by 120 the paired punctuation marks will conform
to the progression of other text segments in the writing direction. In the
exception cases, the rationale to follow the opposite direction is based on
context being established between the enclosed and adjacent segments with
the same direction."
** Agreed, yes, embedding level should be followed in normal case, albeit
for both brackets.
"Other neutral types adjacent to paired punctuation marks are resolved subsequent
to resolving the paired punctuation marks themselves, and will therefore be
influenced by that resolution."
** Agreed again, yes, so far so good.
"The directionality of the enclosed content is opposite the embedding direction,
and at least one 115 neighbor has a bidi level opposite to the embedding
direction O(O)E, E(O)O, or O(O)O."
"*N0. Paired punctuation marks take the embedding direction if the enclosed
text contains a strong type of the same direction. Else, if the enclosed text
contains a strong type of the opposite direction and at least one external
neighbor also has that direction the paired punctuation marks take the direction
opposite the embedding direction."
** I disagree with the above statements.

R(R)L à R -- that is, with embedding ltr -- o.k.
L(R)R à R -- that is, with embedding ltr -- No; these brackets should
take the ltr directionality, that is should take the directionality of
the embedding level if same directionality immediately precedes opening
paren (IMO again but see examples below).

** Same problem in an rtl embedding environment:
L(L)R à L with embedding rtl -- O.k. the text that precedes the parens
is ltr, as is the text in parens, so fine, let the directionality be
different from that of embedding directionality.
R(L)L à L with embedding rtl -- No; same problem as above in the ltr
embedding environment! The directionality of the embedding is the same as
the directionality of the text immediately preceding the parentheses. I
think this sets the reader's expectation for the display of the parens!

So for example (note: as is your convention, upper case letters designate
RTL characters/text and lower case designate ltr):
TEXT: AS-SAYYAD AL-ALIFBAYT (w3c lead, balad1), abc (w3c lead, country2).
O.k., if the embedding directionality of this text is ltr, these parenthese
can be displayed as ltr since there is enough ltr text both inside and outside of them.
However, if the embedding is rtl, the rtl text immediately preceding the
parentheses makes me expect to see the parentheses displayed as rtl;
thus (sorry can't find an rtl comma on my keyboard):
I think in this case embedding directionality should determine the directionality
of the whole maybe even of both parenthetical texts??
=> (country2 ,w3clead abc ,(balad1 ,w3clead) TYABFILA-LA DAYYAS-SA :TXET
(** Thus, I think that, together with the directionality of the embedding level,
the text that logically precedes the parens is critical to the determination of
the directionality of the parens.)
3.2. Also, Sometimes there is no adjacent text on one side of a set of brackets,
or on the other: although fortunately parentheses rarely begin a text block
(except in programming, and in my writing), they often end a text block, followed
by a neutral punctuation mark:
L(R)N
{ ? Have you addressed such cases in your algorithm? I must have missed something. }
* In an ltr text/embedding the above L(R)N should clearly be ltr. {And R(L)N
should be rtl in an rtl text.}
R(R)N
* In an ltr text the above should run rtl nevertheless! {And L(L)N should run
ltr in an rtl embedding!}
These two cases above seem to me to be two obvious cases!!
However, for the next two cases the solution is not so obvious:
L(R)N in an rtl context/embedding
* (should it remain rtl? I am unsure. Probably.)
R(L)N in an ltr context/embedding
* (same question; should it remain ltr? Probably.)

The following I am more sure of:
L(RL)N in an rtl context/embedding
* I would make this ltr
R(RL)N in an ltr context/embedding
* I would make this rtl

4. Examples (just to give you all an idea of some uses of bidi with parens)
(I'm not 100% sure but believe that
Facebook uses any rtl??? and not 1rst strong;
thus, all text at facebook with any rtl character is processed then as rtl --
or not?
perhaps though Facebook's algorithm breaks text into parenthes but any rtl
in or before a parenthes changes embedding direction??
Oh well.
But I don't need to discuss a particular application/site.)
In any case, below are my examples; I am assuming that the embedding is rtl
for all cases

Example A. Embedding rtl or ltr
Sound transcription in English (EXAMPLEORDEFINITIONINOTHERLANG) Definition
in English
* I want parens displayed ltr, with a separate embedding level for content
inside so that it can run rtl of course as it will
=> Sound transcription in English (GNALREHTONINOITINIFEDROELPMAXE) Definition
in English
* you resolve as ltr only if the embedding is ltr?

Example B. Embedding rtl but should be ltr
Text EXAMPLE1INOTHERLANGUAGE (Sound transcription in English, Definition in
English), EXAMPLE2INOTHERLANGUAGE
** What I want:
=> Text: EGAUGNALREHTONI1ELPMAXE (Sound transcription in English, Definition
in English), EGAUGNALREHTONI2ELPMAXE
** That is, I want parens displayed ltr again because of the opening ltr text.
* Adjacent text is all rtl
* Content of parens is strong ltr
* you resolve the parens as ultimately -- it seems, if embedding is rtl then
you resolve the parens as rtl? ;
rtl will not do for my content I guess but particularly not if the embedding
is ltr (which I think it should be here with the ltr text beginning this)
but in that case you do resolve the parens as ltr?? (I think I've read you
right)!!!
In any case, I believe I can resolve the above with a line break (if a neutral
character at the end such as a line break has the correct effect) or a declaration
of document directionality.

Example C. Embedding ltr or rtl
English sentence (EXAMPLEINOTHERLANGUAGE, Sound transcription in English)
More text
** Again I want parens displayed ltr in spite of the EXAMPLEINOTHERLANGUAGE;
however in my case but not in general if the parens run rtl the the phonetic
transcription simply appears before the Arabic text and this won't really hurt me)
* Some text in rtl some ltr in parens
* If embedding is ltr, then your algorithm makes parens come out ltr; this
case is resolved corrrectly immediately.

Best,
--C.E. Whitehead
cewcathar@hotmail.com

Date/Time: Wed Jul 18 13:52:15 CDT 2012
Contact: aharon@google.com
Name: Aharon Lanin
Report Type: Public Review Issue
Opt Subject: PRI #231 issue: proposed N0 violates X10 by working outside of a level run


This is a a comment on PRI #231 (the BPA). The proposal includes adding the
following rule to UAX #9:

*N0. Paired punctuation marks take the embedding direction if the enclosed
text contains a strong type of the same direction. Else, if the enclosed text
contains a strong type of the opposite direction and at least one external
neighbor also has that direction the paired punctuation marks take the
direction opposite the embedding direction. [...] This rule is applied to
those paired punctuation marks that are correctly nested and occur at the same
level without an intervening drop below their level."

This rule does not fit into the Unicode Bidirectional Algorithm as currently
formulated for a technical reason: rule X10 (which PRI #231 does not propose
to modify) states that "The remaining rules are applied to each run of
characters at the same level." The proposed N0 is one of those "remaining
rules", but attempts to apply to pairs of characters that may be separated by
a higher embedding level, and thus may be in two different level runs. This
violates X10.

Date/Time: Sun Jul 29 16:41:05 CDT 2012
Contact: matitiahu.allouche@gmail.com
Name: Matitiahu Allouche
Report Type: Public Review Issue
Opt Subject: PRI #231 (Bidi Parentheses Algorithm)

Ooops! I am past the closing date. I hope the comments
below will be considered nevertheless.

1) In the table of section 3.5, the last line of the second column R(LR)R
should be highlighted, since the UBA will resolve the open paren as LTR and
the close paren as RTL.

2) In the same table, the sixth line of the fourth column L(LR)L should be
highlighted, since the UBA will resolve the open paren as LTR and the close paren as RTL.

3) Same comment for the next line L(LR)R.

4) Line 115 mentions "the directionality of the enclosed content". It is not clear
what this directionality is when the content includes mixed LTR and RTL text.

5) For completeness, rule N0 should specify what happens when the enclosed text
is all N, even if to say that the BPA does not affect this case.

6) In section 5, example 6, I don't understand the result of the BPA. From the
UBA display, I understand that the LTR text (Microsoft Corp) is at the logical
end of the string. In the BPA display it appears on the starting end of the
string. I see nothing in the definition of the BPA which should give such a result.

7) Is a solution to the current problem of mismatched parenthesis desirable?
I am not sure, because of the following reasons:
a. The UBA is already quite complex. The BPA would add still more complexity.
Proof is that, if my comments 1-3 and 6 above are founded, even the author of the
proposal has missed some fine points. And if my comments are not founded, I am
myself the one who got confused, despite the fact that I have more experience in
bidi matters than the average person.
b. Consider a text editor implementing the UBA and BPA by transforming the
logical text to visual display after each keystroke. When entering an open paren,
the BPA will not kick in, since it is not paired. When entering the closing paren,
the BPA will kick in, possibly modifying the display of text around the opening
paren, which may be a few lines far from the typing location.
The UBA also has effects of modifying the appearance of text already entered,
but it is always in the close neighborhood of the typing location.

8) Does the proposed solution meet expectations in terms of the naturalness of
segmentation and directional flow of enclosed units?
The proposal assumes that opposite direction content within parentheses forms
a unique directional run with opposite direction text on either side. When one
of the sides has the embedding direction, I don't see that the context has opposite
direction rather than embedded direction. In doubt, the BPA should not assume
opposite direction for the parentheses.
Here is an example. The text in logical order (with upper case representing RTL
letters) is
"I LIVE IN paris (france)."
Assuming a RTL paragraph direction, the UBA will display
".(paris (france NI EVIL I"
The BPA will display
".paris (france) NI EVIL I"
which is better. However, since the general direction of the text is RTL, I prefer
to have it displayed as
".(france) paris NI EVIL I"
To get this result, rule N0 can be reformulated as follows:
N0. Paired punctuation marks take the opposite direction if the enclosed text contains
no strong type of the embedding direction and the external neighbors on both sides
have the opposite direction. Else the paired punctuation marks take the embedding direction.

9) Should the BPA be implemented as a new rule affecting the resolution of neutral
types in the core UBA . proposed rule N0?
Should the BPA, rather, be a recommended implementation using higher level protocols?
As said above, I am not sure that the BPA has more benefits than problems, but if it
is adopted, I think it should be in the core UBA. Leaving it for a higher level
protocol introduces one more degree of uncertainty in the behavior of the presentation
system, and this is not something we need.

10) Are stability concerns adequately addressed?
I think that in most cases, reasonable bidi text which looks good with the UBA will
look the same with the BPA. However here is an exception. The logical text is (where ]
represents RLE and ^ represents PDF):
I LIVE IN ]paris^(france).
The UBA will display it as:
.(france)paris^] NI EVIL I
The BPA, according to the proposed N0, will assign the opposite direction to the
parentheses, thus displaying:
.paris^(france)] NI EVIL I
You can see that the order of "paris" and "france" is reversed. That would not happen
with the revised N0 that I suggested in comment 8 above.

11) Are interoperability concerns during the migration period adequately addressed?
The document states: " The main stability concern therefore is that text authored using
the BPA may display differently when rendered on a system which has not implemented the
BPA. In such a case, the reader of that text is no worse off than they would have been
prior to the development of the BPA."
This is not quite correct. Without the development of the BPA, the document author
would have taken measures (like adding control characters) to create a correct display
under the UBA. Authors writing on BPA-supporting systems are likely to create text
which will be rendered differently on BPA-ignorant systems.
However, this is a problem which will always surface when new features are introduced.