From: Mark Davis (mark@macchiato.com)
Date: Wed Apr 29 2009 - 19:30:57 CDT
I made some of those fixes, so let me know if there are further problems.
The ASCII mapping I am using has the normal BIDI class values, with the
following overrides.
asciiHackMap.put(']', LRE);
asciiHackMap.put('[', RLE);
asciiHackMap.put('}', LRO);
asciiHackMap.put('{', RLO);
asciiHackMap.put('|', PDF);
asciiHackMap.putAll(new UnicodeSet("[A-M]"), R);
asciiHackMap.putAll(new UnicodeSet("[N-Z]"), AL);
asciiHackMap.putAll(new UnicodeSet("[5-9]"), AN);
asciiHackMap.put('>', L);
asciiHackMap.put('<',R);
asciiHackMap.put('"',NSM);
asciiHackMap.put('_',BN);
I have not tried reconciling those with Asmus's values, which appear to be:
int TypesFromChar[] =
{
//0 1 2 3 4 5 6 7 8 9 a b c d e f
ON, ON, ON, ON, L, R, ON, ON, ON, ON, ON, ON, ON, B, RLO,RLE, /*00-0f*/
LRO,LRE,PDF,WS, ON, ON, ON, ON, ON, ON, ON, ON, ON, ON, ON, ON, /*10-1f*/
WS, ON, ON, ON, ET, ON, ON, ON, ON, ON, ON, ET, CS, ON, ES, ES, /*20-2f*/
EN, EN, EN, EN, EN, EN, AN, AN, AN, AN, CS, ON, ON, ON, ON, ON, /*30-3f*/
R, AL, AL, AL, AL, AL, AL, R, R, R, R, R, R, R, R, R, /*40-4f*/
R, R, R, R, R, R, R, R, R, R, R, ON, B, ON, ON, ON, /*50-5f*/
NSM, L, L, L, L, L, L, L, L, L, L, L, L, L, L, L, /*60-6f*/
L, L, L, L, L, L, L, L, L, L, L, ON, S, ON, ON, ON, /*70-7f*/
};
http://www.unicode.org/reports/tr9/BidiReferenceCpp/bidi.c.txt
Mark
On Tue, Apr 28, 2009 at 20:40, Mark Davis <mark@macchiato.com> wrote:
>
> On Tue, Apr 28, 2009 at 06:28, Matitiahu Allouche <matial@il.ibm.com>wrote:
>
>>
>> Hello, Mark!
>>
>> This demo is useful, and quite nicely done. A few remarks.
>
>
> Thanks, and thanks for the comments.
>
>
>>
>> 1) By default, base level 1 is assumed. A check box (LTR paragraph)
>> allows forcing the base level to 0.
>> The default behavior is not quite conformant to the UBA (rule P2). I
>> suggest to replace the check box by 3 radio buttons for UBA default, forced
>> LTR and forced RTL respectively.
>
>
> I agree. I did pretty much throw it together, so I didn't expose all three
> choices, but I can make it either a pull-down or radio buttons.
>
>
>>
>> 2) The checkbox for "ASCII Hack" may not be understood by casual Bidi
>> overseekers. The section added at the end of the page when checking the box
>> can easily fall beyond the current screenful so that the user will not even
>> be aware that something has happened.
>> I suggest to add a short explanation close to the checkbox and a reference
>> to the added section.
>
>
> Agreed. What I really need to do is supply much more of a description.
>
>>
>>
>> 3) The characters in your ASCII hacking table are different from those
>> chosen by Asmus Freytag in his Bidi Tool (part of the Unibook application),
>> for no benefit that I can see. I suggest to align your table with Asmus's,
>> if for no other reason than that he was the first, so that we veteran Bidi
>> dabblers are used to it.
>
>
> I basically just went with the characters that are in
> http://unicode.org/reports/tr9/BidiReferenceJava/BidiReferenceTestCharmap.java.txt,
> plus adding others so as to cover all the classes. I can definitely change
> those, although if the differ across versions of reference code we'll want
> to fix it. (For others, this is not an intrinsic part of the algorithm, just
> for testing.) Where are the Unibook ones listed?
>
>
>>
>>
>> 4) The ASCII Hack characters used for ES, ET and CS should be chosen among
>> characters which really have this classification in the latest versions of
>> Unicode. Putting Plus and Hyphen-Minus signs in the ET class sets us back
>> to Unicode 3.x and might reopen an old quarrel with Microsoft (joking :-).
>> Also, Solidus is really CS and is a bad representative for ES.
>>
>> 5) The 001C-001E characters in the B class are rendered as square blocks
>> in my browser (and probably anybody else's). Since they are not easily
>> generated from a keyboard, I suggest to just remove them.
>>
>> 6) 000C is really WS and is not a good representative for the B class.
>> The other representatives of this class are not printable. I suggest to
>> add names and/or hex codes in a comment column.
>>
>> 7) All the characters in the S class are not good choices, being either
>> not easily generated from the keyboard (000B, 001F) or being intercepted by
>> the browser (0009). I suggest to remove those and add some printable ASCII
>> character.
>>
>> 8) Same thing for the WS class: I suggest to add name and/or hex code in
>> a comment column.
>>
>> 9) Your ASCII Hack table has no representatives for LRM and RLM. I
>> suggest to use @ for LRM and & for RLM.
>
>
> I used > and <.
>
>
>>
>>
>> 10) The string "abc\nde" (keying Enter between "abc" and "de") causes a
>> server internal error when pressing the "Show Bidi" button.
>
>
> Ah, yes, I didn't check for multiple lines; I'll fix that.
>
>
>>
>>
>>
>> Shalom (Regards), Mati
>> Bidi Architect
>> Globalization Center Of Competency - Bidirectional Scripts
>> IBM Israel
>> Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
>> 2554160
>>
>>
>>
>> *Mark Davis <mark@macchiato.com>*
>> Sent by: bidi-bounce@unicode.org
>>
>> 28/04/2009 03:29
>> To
>> "bidi@unicode.org" <bidi@unicode.org>
>> cc
>> Unicode <unicode@unicode.org> Subject
>> [bidi] Bidi demo
>>
>>
>>
>>
>> I posted a bidi demo at *http://unicode.org/cldr/utility/bidi.jsp*
>>
>> For a given sample string, it shows the results of applying the bidi
>> algorithm *and* the rules responsible for each character's resulting
>> level. (The UI isn't polished; I threw it together using off-the-shelf
>> components, and some small modifications to the UBA reference code to
>> capture the rules.) The default sample is chosen to invoke most of the
>> rules. Comments are welcome.
>>
>> Mark
>>
>
>
This archive was generated by hypermail 2.1.5 : Wed Apr 29 2009 - 19:35:47 CDT