From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 02 2005 - 15:27:40 CDT
From: "John Hudson" <tiro@tiro.com>
> Now, when it comes to things like parentheses, the mirrored stuff does my
> head in and I really don't see the point of it. I'm guessing that it
> confuses application developers also, since it is implemented with so
> little consistency.
Just remember that what is really encoded for parenthesis is their semantic
(open/start or close/end), not the look of their glyph. The "mirrored"
property in fact only affects the glyph orientation, but parentheses remain
weak as regard to their directionality (i.e. the direction of the cursor
movement when the logically encoded string is completed with the character).
This can make this complicate in this case. Suppose that lowercase
characters below are strong LTR (latin/greek/cyrillic...) letters, and the
uppercase characters are strong RTL (arabic/hebrew) letters. Then how will
you interpret the following rendered string, if you only look at the
rendered document without knowing its encoding:
Encoded as: "latinlatin (ARABICARABIC (latinlatin CIBARA) latin))".
Rendered as: "latinlatin () CIBARACIBARAlatinlatin (CIBARA latin))".
It will be difficult to say when looking only at the rendered document which
parenthese is opening and which is closing. The mirrored property will work
well only if the direction context is the same before the begining and at
end of the surrounded sequence. The tricky cases occur if parentheses are
nested and also surrund sections of text with different directionality.
Now suppose that characters had not been mirrored. One would have needed to
encode the OPEN PARENTHESE as ')' in a RTL context, so the semantic of the
same character would have been lost, in profit of the invariability of the
glyph direction. Anyway, you would have then encoded this for the same
logical text:
Encoded as: "latinlatin (ARABICARABIC )latinlatin CIBARA( latin))".
Rendered as: "latinlatin () CIBARACIBARAlatinlatin (CIBARA latin))".
to finally get the same result... It would have just been more complicate to
enter regular Arabic-text only.
Now suppose you wanted to use distinct characters with RTL directionality
(suppose below that the Arabic parentheses are noted with ']' for
start/opening, and '[' for end/closing.) You would have encoded this for the
same logical text. Note that because the new punctuation would be explicitly
RTL, they would not need to be mirrored:
Encoded as: "latinlatin (ARABICARABIC ]latinlatin CIBARA[ latin))".
Rendered as: "latinlatin (] CIBARACIBARAlatinlatin [CIBARA latin))".
Would that be really more satisfactory for the interpretation? No.
Conclusion, it's hard to determine the effective semantics of mirrorable
characters outside of the simplest cases where they are used: to surround
text with consistent directionality between its start and end. Having to
duplicate characters to avoid mirroring or swap of directionality does not
help simplifying the problem. It's then best not to duplicate needlessly the
mirrorable characters with weak directionality.
Users do perceive these RTL parentheses characters identically with the same
semantics, and there's no need to add to the confusion. Duplicating these
codes won't help, notably because they don't have effectively distinct
glyphs. Things are different when the representative glyphs are arguably
distinctable, so that it is non-sense to make them mirrored. see for example
the encoded differences between () and [] pairs. If a script has its own
distinctable glyphs for parenthese pairs, there's no need to mirror them.
Else it's best to keep the mirrored property, and thus keep the characters
with weak directionality.
This archive was generated by hypermail 2.1.5 : Tue Aug 02 2005 - 15:29:37 CDT