From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 01 2004 - 15:16:31 CST
From: "kefas" <pmr@informatik.uni-frankfurt.de>
> Inserting unicode/basic-hebrew reults in a convinient
> RtL, right-to-left, advance of the cursor, but the
> space-character jumps to the far right. Is there a
> RtL-space?
> In MS-Word and OpenOffice I can only change whole
> paragraphs to RtL-entry. But quoting just a few
> words in hebrew WITHIN a paragraph would be helpful to
> many.
And this is what the embedding controls are made for:
- surround an RTL subtext (Hebrew, Arabic...) within LTR paragraphs
(Latin...) with a RLE/PDF pair.
- souround an LTR subtext (Latin, ...) within RTL paragraphs (Hebrew, ...)
with a LRE/PDF pair.
There's no need of a separate RTL space, given that the regular ASCII SPACE
(U+0020) character is used within all RTL texts as the standard default word
separator, and it inherits it has a weak directionality, that does not force
a direction break, but that his inherited from the surrounding text.
A good question however is whever the space should inherit its direction
from the previous ctext or the next one.
- If the previous text has a strong directionality, then the space should
inherit its direction. This should be the case everytime you are entering
text with a space at end: it's very disturbing to see this new space shift
on the opposite side, when entering some space-sparated hebrew words within
a Latin text, because the editor assumes that no more Hebrew will be added
on the same line (this causes surprizing editing errors, for example when
creating a translation resource file where translated resources are prefixed
by an ASCII key, for example when editing a .po file for GNU programs using
gettext()).
- If the previous text in the same paragraph has no directionality, then it
inherits its direction from the text after it (if it has a strong
directionality);
- if this does not work then a global context for the whole text should be
used, or alternatively the directionality of the end of the previous
paragraph (this influences where the cursor would go to align such
weakly-directed paragraph with the previous paragraph, including the default
start margin position.)
The regular Bidi algorithm should be used to render a complete text, but
strict Bidi rules should not be obeyed everytime when composing a text,
where the current cursor position should act as a sentence break with a
strong inherited directionality: the text can then be redirected at this
position when the cursor moves to other parts of the text.
I don't think this is an issue of renderers but of editors (notably in
Notepad, where you won't know exactly where to enter a space during edition,
unless you use the contextual menu that allows switching the global default
directionality, and swap the alignment to the side margins; sometimes, when
you want to know where there are REL/RLE and PDF Bidi controls, it's nearly
impossible to determine it vizually in Notepad, unless you use an external
tool such as native2ascii, from the Java SDK, to change the encoding with
clearly visible marks). It's unfortunate, given that Notepad (since Windows
XP) offers you a directly accessible contextual menu to enter Bidi controls
and change the global direction and alignment to side margins. (But notepad
has a "visible controls" editing mode, to solve such ambiguities.)
> Related: The other Hebrew characters in the alphabetic
> presentation forms insert themselves in LtR-fashion?
> Why this difference?
> I read about Logical and Visual entry, but don't see
> how that answers my 2 questions above.
Visual entry should never be used. It was used for some legacy encodings to
render text on devices that don't implement the Bidi algorithm and can only
render text as LTR. Nobody enters RTL text in "pseudo-visual" LTR order;
only the logical input order is needed.
But don't mix the input order and the encoding order as they can be
different (it should not if the text is converted and stored in Unicode,
where only the logical order is legal for any mix of Latin, Greek, Cyrillic,
and Hebrew, Arabic).
The case for Thai is different because its input order is (historically)
visual rather than logical, and then the text is encoded using the same
(visual) order. This is not changed with Thai in Unicode, to keep its
compatibility with the national Thai standard TIS-620 (and further
revizions). So even though Thai uses an non-logical order, its input order
and encoding order is the same.
The difference of encoding orders is known mainly for historic texts created
for modern Hebrew, and more rarely Arabic, or for texts encoded in a private
pre-press encoding used to prepare the global layout of pages (these texts
are more easily and fast processed in complex page layouts if they are
prepared in visual order before flowing them in the page layout template;
such applications use specific encodings in a richer rendering context than
just plain text, so this is out of scope of the Unicode standard itself).
This archive was generated by hypermail 2.1.5 : Mon Nov 01 2004 - 15:26:34 CST