Re: Markup for Language (was: Re: Exemplifying apostrophes)

From: Douglas Davidson (ddavidso@apple.com)
Date: Wed May 28 2008 - 12:06:11 CDT

Next message: Lorna_Priest@sil.org: "Re: Glottal stop languages"

Previous message: Marcin ‘Qrczak’ Kowalczyk: "Re: Stateful?"
In reply to: Behnam: "Re: Markup for Language (was: Re: Exemplifying apostrophes)"
Next in thread: Richard Wordingham: "Re: Markup for Language (was: Re: Exemplifying apostrophes)"
Reply: Richard Wordingham: "Re: Markup for Language (was: Re: Exemplifying apostrophes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On May 28, 2008, at 4:48 AM, Behnam wrote:

> Right now, on my text editor, I right click and select the
> directionality of the paragraph. That's what I did on that picture
> at the end (which shouldn't be confused with right alignment). This
> doesn't go to the higher level and higher level shouldn't change
> that (which unfortunately is not always the case).

As it happens, the paragraph directionality in the case you mentioned
is handled by a higher-level protocol. Your picture shows a rich-text
document, for which the paragraph directionality is a feature of the
paragraph style; its embodiment in a document varies with the format,
but in the case of RTF it would use the \rtlpar control word to
indicate RTL paragraphs, while for HTML it would use a dir="rtl"
attribute.

The alternative mechanism for representing this in plain text would be
to insert a bidirectional control character, either RLM or LRM, at the
beginning of each directionally marked paragraph. These characters
are not specifically marks of paragraph base writing directionality,
but their presence at the beginning of a paragraph would be sufficient
to indicate it. However, this is not the mechanism currently used in
the case you mention.

There are a number of reasons why the insertion of invisible control
characters is an awkward solution for editing. Great care would need
to be taken, for example, to make sure that control characters would
not be accidentally deleted, or copied and pasted to inappropriate
places. On the other hand, they would need to be carefully preserved
in certain cases of copying, for example to make sure that copying an
entire paragraph would preserve its directionality. These
considerations would be especially important for control characters
that appear in beginning and ending pairs. A "show invisibles" mode
would probably be needed, just to assure sophisticated users that the
control characters were properly positioned, but it would be likely to
confuse the less sophisticated.

Higher-level protocols, by contrast, are well suited to the needs of
editing. They can naturally associate attributes with ranges of text,
just as they do for style attributes such as fonts, underlines, and so
forth. The problems of insertion, deletion, copying and pasting, and
so forth are much more tractable. In general, higher-level protocols
are more naturally expressive of the user's intent; in computer
science terms, they separate controls from data, with the underlying
Unicode character stream representing the data and the higher-level
protocols representing the control information.

If one has control of the import and export processes, then it would
be possible to take text in which information is internally
represented using higher-level protocols, and export it to plain text
with appropriate control characters inserted, or to import from plain
text and replace the control characters with the internal
representation. The use of control characters in plain text is a
necessary fallback mechanism if plain text is all that is available,
and if the text is not going to be edited or otherwise altered--
provided that the processes receiving it are sufficiently Unicode-
savvy to handle the control characters properly. However, more and
more it is the case that at least some form of markup is available,
and where it is, it is generally better to make use of it.

Douglas Davidson

Next message: Lorna_Priest@sil.org: "Re: Glottal stop languages"
Previous message: Marcin ‘Qrczak’ Kowalczyk: "Re: Stateful?"
In reply to: Behnam: "Re: Markup for Language (was: Re: Exemplifying apostrophes)"
Next in thread: Richard Wordingham: "Re: Markup for Language (was: Re: Exemplifying apostrophes)"
Reply: Richard Wordingham: "Re: Markup for Language (was: Re: Exemplifying apostrophes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 12:07:54 CDT