Authors: John I. McConnell JohnMcCo@microsoft.com,
F. Avery Bishop
AveryB@microsoft.com, David Brown DBrown@microsoft.com
Majd Abbar A-MajdA@microsoft.com, Ronen Yacobi A-RonenY@microsoft.com
18-Nov-1997
This memo describes a proposal to change the bidirectional category of the SOLIDUS character in the Unicode 2.0 character database. Specifically, it would change the category from European Separator to Common Separator. The effect of this change is to alter the visual order of text containing SOLIDUS and text from right-to-left writing systems such as Arabic and Hebrew. The overall intent of the proposal is to better match such behavior with user expectations and existing practice.
If the Consortium accepts the proposal, it would also require changing the entries in Table 3-5 on page 3-17 and Table 4-4 on page 4-11 of the Unicode Standard. Note that there are no changes required to the Unicode bidirectional algorithm itself.
With the introduction of the first Unicode-based software in the Middle East, users now have some experience with conversion of existing data to Unicode. Although the transition has been smooth, there have been some difficulties with fractions.
This section shows the effect of the proposed changes on two important cases: fractions and dates. In each test case we follow the same conventions as the Unicode 2.0 book, that is, uppercase letters correspond to strong right-to-left characters whereas lowercase letters correspond to strong left-to-right characters. In addition, we have also included examples using Arabic and Hebrew text. In all the examples except as noted, the embedding level is right-to-left. Results that differ from the current values in Unicode 2.0 are shaded.
The proposed change effects only the resolution of weak neutrals in steps P0 through P5 of the Unicode Bidirectional Algorithm. This limits the changes of behavior to cases where SOLIDUS is adjacent to numbers.
Table 2 Fractions
Logical Order |
Current Visual Order |
Proposed Visual Order |
ADD 1/2 CUP (Arabic) |
PUC 2/1 DDA |
PUC 1/2 DDA |
ADD 1/2 CUP (Hebrew) |
PUC 1/2 DDA |
PUC 1/2 DDA |
There are many date formats but the proposed changes would affect one frequently used form.
Table 3 Dates
Logical Order |
Current Visual Order |
Proposed Visual Order |
MEET ON 01/23/45 (Hebrew) |
01/23/45 NO TEEM |
01/23/45 NO TEEM |
MEET ON 01/23/96 (Arabic) |
96/23/01 NO TEEM |
01/23/96 NO TEEM |
Without explicit formatting, it is impossible for both dates and fractions to display properly. Although the date change is undesirable, our users would prefer to have fractions correct rather than dates. There seem to be several reasons for this preference:
Although there are some tradeoffs, the authors believe that this proposal would more closely match user expectations for visual order of right-to-left text and expedite the development of software for regions that use such text. This improvement would promote the acceptance of Unicode for an important emerging software market.
Proposed Correction to Mirroring List
Both Unicode 2.0 and ISO 10646 define a normative list of mirrored characters. We believe that four characters have been omitted from these lists. Specifically, the four characters in Table 1 should be added to the lists of characters with the mirroring property.
Code Point |
Glyph |
Unicode 2.0 Name |
0x00AB |
« |
LEFT-POINTING DOUBLE ANGLE QUOTATION MARK |
0x00BB |
» |
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK |
0x2039 |
‹ |
SINGLE LEFT-POINTING ANGLE QUOTATION MARK |
0x203A |
› |
SINGLE RIGHT-POINTING ANGLE QUOTATION MARK |
Although the use of these characters varies, the mirroring behavior is unambiguous. For example, those printing traditions that use the left-pointing quotation mark to begin a left-to-right quotation use the right-pointing quotation mark to begin a right-to-left quotation and vice versa.
This correction would also reconcile the mirroring behavior of these characters with their cross-referenced characters such as 0x226A MUCH LESS THAN and 0x300A LEFT DOUBLE ANGLE-BRACKET. All of these related characters are listed as mirroring.
The effect of the correction would be to add these four characters to table 4-7 in the Unicode 2.0 book.