L2/04-049
Re: | Bidi Conformance |
From: | Mark Davis |
Date: | 2003-01-29 |
In the wake of the UTC last meeting, we had come to the conclusion that it was more important to have uniformity in the application of the Bidi algorithm than to allow the overriding of specific characters. We had considered having a letter ballot, but the delay of 4.0.1 rendered that unnecessary.
The following is a suggested revision of 4.3 Higher-Level Protocols, that removes the general ability to override characters. It makes a few other changes as well:
An open issue is whether we want IRIs (URLs) to fall under the span of #4 or not.
Also, we have always had the little snippet "When text using a higher-level protocol is to be converted to Unicode plain text, formatting codes should be inserted to ensure that the order matches that of the higher-level protocol." There is one remaining circumstance where one cannot do that. If one wants to emulate #2 below, we don't have the formatting codes to do it.
The following are permissible ways for systems to apply higher-level protocols to the ordering of bidirectional text.
Interpret spans of text separately in the presence of markup
When displaying a document consisting of textual data and visible markup, a higher-level process can handle syntactic elements in the markup separately from the textual
data. For example, suppose an XML document contains the following fragment:
ARABICenglishARABIC<e1 type='ab'>ARABIC2<e2 type='cd'>english
One can consider this as consisting of 5 separate pieces:
ARABICenglishARABIC
<e1 type='ab'>
ARABIC2
<e2 type='cd'>
english
To make the XML file readable, the display can order these elements all in a uniform direction (e.g. all left-to-right), and apply the bidi algorithm to each field separately. Alternatively, the process could treat each of the pieces of syntax as if it were a single inline object, e.g. as U+FFFC OBJECT REPLACEMENT CHARACTER, and apply the bidi algorithm to the whole.
When text using a higher-level protocol is to be converted to Unicode plain text, formatting codes should be inserted to ensure that the order matches that of the higher-level protocol.