From: Douglas Davidson (ddavidso@apple.com)
Date: Wed May 28 2008 - 12:06:11 CDT
On May 28, 2008, at 4:48 AM, Behnam wrote:
> Right now, on my text editor, I right click and select the  
> directionality of the paragraph. That's what I did on that picture  
> at the end (which shouldn't be confused with right alignment). This  
> doesn't go to the higher level and higher level shouldn't change  
> that (which unfortunately is not always the case).
As it happens, the paragraph directionality in the case you mentioned  
is handled by a higher-level protocol.  Your picture shows a rich-text  
document, for which the paragraph directionality is a feature of the  
paragraph style; its embodiment in a document varies with the format,  
but in the case of RTF it would use the \rtlpar control word to  
indicate RTL paragraphs, while for HTML it would use a dir="rtl"  
attribute.
The alternative mechanism for representing this in plain text would be  
to insert a bidirectional control character, either RLM or LRM, at the  
beginning of each directionally marked paragraph.  These characters  
are not specifically marks of paragraph base writing directionality,  
but their presence at the beginning of a paragraph would be sufficient  
to indicate it.  However, this is not the mechanism currently used in  
the case you mention.
There are a number of reasons why the insertion of invisible control  
characters is an awkward solution for editing.  Great care would need  
to be taken, for example, to make sure that control characters would  
not be accidentally deleted, or copied and pasted to inappropriate  
places.  On the other hand, they would need to be carefully preserved  
in certain cases of copying, for example to make sure that copying an  
entire paragraph would preserve its directionality.  These  
considerations would be especially important for control characters  
that appear in beginning and ending pairs.  A "show invisibles" mode  
would probably be needed, just to assure sophisticated users that the  
control characters were properly positioned, but it would be likely to  
confuse the less sophisticated.
Higher-level protocols, by contrast, are well suited to the needs of  
editing.  They can naturally associate attributes with ranges of text,  
just as they do for style attributes such as fonts, underlines, and so  
forth.  The problems of insertion, deletion, copying and pasting, and  
so forth are much more tractable.  In general, higher-level protocols  
are more naturally expressive of the user's intent; in computer  
science terms, they separate controls from data, with the underlying  
Unicode character stream representing the data and the higher-level  
protocols representing the control information.
If one has control of the import and export processes, then it would  
be possible to take text in which information is internally  
represented using higher-level protocols, and export it to plain text  
with appropriate control characters inserted, or to import from plain  
text and replace the control characters with the internal  
representation.  The use of control characters in plain text is a  
necessary fallback mechanism if plain text is all that is available,  
and if the text is not going to be edited or otherwise altered-- 
provided that the processes receiving it are sufficiently Unicode- 
savvy to handle the control characters properly.  However, more and  
more it is the case that at least some form of markup is available,  
and where it is, it is generally better to make use of it.
Douglas Davidson
This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 12:07:54 CDT