L2/04-049R1
Re: | Bidi Conformance |
From: | Mark Davis |
Date: | 2003-02-03 |
This is a draft revision that takes into account the discussion in the ad hoc bidi meeting during the UTC. It is being circulated among the bidi group for comments before presentation to the UTC.
In the wake of the UTC last meeting, we had come to the conclusion that it was more important to have uniformity in the application of the Bidi algorithm than to allow the overriding of specific characters. We had considered having a letter ballot, but the delay of 4.0.1 rendered that unnecessary.
The following is a suggested revision of 4.3 Higher-Level Protocols, that removes the general ability to override characters. It makes a few other changes as well:
The following clauses are the only permissible ways for systems to apply higher-level protocols to the ordering of bidirectional text. Some of the clauses apply to segments of structured text. This refers to the situation where text is interpreted as being structured, whether with explicit markup such as XML or HTML, or internally structured such as in a word processor or spreadsheet. In such a case, a segment is span of text that is distinguished in some way by the structure.
Apply the bidi algorithm to segments
The bidi algorithm can be applied independently to one or more segments of structured text. For example, when displaying a document consisting of textual data and visible
markup in an editor, a higher-level process can handle syntactic elements in the markup separately from the textual data.
Clauses #1 and #3 are not logically necessary; they are covered by applications of clauses #4 and #5. However, they are included for clarity because they are more common operations.
As an example of the application of #4, suppose an XML document contains the following fragment. (Note: this is a simplified example for illustration: element names, attribute names, and attribute values could all be involved.)
ARABICenglishARABIC<e1 type='ab'>ARABICenglish<e2 type='cd'>english
This can be analyzed as being 5 different segments:
ARABICenglishARABIC
<e1 type='ab'>
ARABICenglish
<e2 type='cd'>
english
To make the XML file readable as source text, the display in an editor could order these elements all in a uniform direction (e.g. all left-to-right), and apply the bidi algorithm to each field separately. It could also choose to order the element names, attribute names and attribute values uniformly in the same direction (e.g. all left-to-right). For final display, the markup could be ignored, allowing all of the text (segments a, c, and e) to be reordered together.
An IRI (international URI) can be analyzed as being structured text, and thus a higher-level protocol could apply clause #4 and order the segments in a uniform direction . However, the existence of this capability does not imply that in normal display this either should or should not be done.
When text using a higher-level protocol is to be converted to Unicode plain text, for consistent appearance formatting codes should be inserted to ensure that the order matches that of the higher-level protocol.