From: Harald Alvestrand (harald@alvestrand.no)
Date: Thu Jun 07 2007 - 07:34:37 CDT
Having failed to find anything, I appeal to this list...
as part of the (slowly moving) investigation into the requirements for
using RTL scripts in domain names, I have been checking out the
properties of the Unicode BIDI algorithm.
One problem I have is that there seems to be a dearth of test datasets
to test an implementation against; my investigation of the Unicode
"reference" implementation has revealed that the C++ and C
implementations are basically toys, fit for verifying an algorithm, but
totally useless for real data; they assign random directional properties
to the ASCII characters and use that for testing the algorithm.
(I have not looked at the Java one).
Can anyone point me at:
1) An implementation of the Unicode BIDI algorithm that can take real
Unicode data and return something that I can verify (either the list of
characters in display order or the list of indexes to which I should
remap the characters)?
2) Some test dataset of "real" (linguistically sensible, not just random
characters) that has been verified by hand to display as expected after
running through the Bidi algorithm? (Ideal would be input/output pairs
for the implementation above, of course)
Any hints are greatly appreciated!
Harald
This archive was generated by hypermail 2.1.5 : Thu Jun 07 2007 - 07:36:44 CDT