Asmus Freytag
October 28, 2011
The various recent proposals for amending the Bidi Algorithm strike me as a not insignificant departure from the basic Bidi Algorithm and essentially in contradiction to the spirit if not the letter of the stability guarantees. They bring with them, therefore, the risks of instability and incompatibility. The Bidi Algorithm is one of only a few algorithms that is required for Unicode conformance, and, at the same time, it has been held very stable. This has reduced the amount of divergence among implementations. Most "changes" to the algorithm, recently, have been in the nature of "clarifications" of edge cases, rather than modifications. Because UBA is such a basic and strongly required algorithm, stability guarantees are especially important. This includes the implicit guarantee that the Bidi classes are the complete description of a character under the UBA.
The way I have always parsed the "spirit" of the stability guarantees for the Bidi Algorithm is that it was effectively stable - except as to the additions of new letters (and perhaps minor bug-fixes for the rules). The default property assignments for unassigned characters were carefully chosen to minimize disruptions when these characters were eventually assigned.
The written and unwritten policies for maintaining the Bidi Algorithms effectively provided several important guarantees.
As a result, you could expect any existing implementation to show the same Bidi ordering for the vast majority of texts containing characters beyond the ones that it was explicitly updated for. This maximizes interoperability.
After a long quiescent period, there are now many ideas and
suggestions for fixing perceived or real shortcomings of the existing
bidi algorithm. As Martin Duerst wrote recently : "it looks like these changes are
being added piecemeal without yet seeing a new horizon of stability...but the
Bidi Algorithm isn't an area where constant tinkering is
advisable. It would therefore be very important that all these new
initiatives are carefully checked against each other, and
coordinated both in timing and in substance. It may be well
advisable to wait with some of them so that many changes can be made
'in bulk' (the idea of an UBA 2.0), which will also help
implementers."
I share this concern, and would support an effort towards a UBA 2.0 which addresses a comprehensive set of updates.
Because of the nature of the proposed changes, this new specification would be disruptive. Therefore, I see little benefit in making it subject to formal limitations on the number of Bidi classes etc. that were in place for the existing Bidi Algorithm. However, like the existing algorithm, any updated one should be based on Bidi classes as input, cleanly separating the mapping of characters to Bidi classes from the leveling and reordering calculus.
The temptation to mix specification in terms of character codes with specification in terms of Bidi classes should be firmly resisted.
There should be a clean versioning of this new algorithm, that is independent of mere versioning of the Unicode Standard. Existing implementations of the "old" bidi algorithms aren't going to go away overnight, and, because the way the Bidi Algorithm is designed, they may well be updated to handle future repertoire additions.
The "new" bidi algorithm could formally be an extension or a replacement. It's too early to tell which makes better sense, but it should have a designation that decouples that extension from questions of repertoire (and versioning for "bug-fixes" or other minor tweaks).
The proposed and contemplated changes to the Bidi Algorithm call for a separate development effort that is not a-priori tied to the schedule of Unicode Versions. As with all significant developments in specifying algorithms, an extended Beta, or testing period is desirable, which should only terminate after significant stakeholder have been able to produce actual working testbeds.