From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun May 22 2005 - 13:18:48 CDT
You have certainly used a legacy font that mapped hebrew letter glyphs on
top of symbols or on ISO-8859-1 characters that all have a strong or weak
LTR directionality. For this reason, the Bidi algorithm did not apply to
these old documents.
When you replace the codepoints by normal Hebrew codepoints, Word
consistently applies the Bidi algorithm to render them, and so the visual
order is now reversed. So the text is effectively encoded with a "visual"
order instead of the "logical" order.
So you'll have effectively to reverse the effect of what the BiDi algorithm
makes now:
- This means not only reversing the Hebrew letters,
- but also handling the case where characters with weak directionality (like
punctuations) are also swapped now,
- and possibly mirrored.
To know exactly what to do, you have to study what the BiDi algorithm does,
and then adapt the encoding so that the standard BiDi reordering (and
mirroring) will generate the correct visual order and characters
orientation. The conversion will sometimes require inserting some BiDi
controls to avoid that these characters with weak directionality be
reordered or mirrored
(Be careful about the effect of mirroring: the BiDi algorithm changes the
orientation of some characters like parentheses, so if you just swap
characters, the parentheses may look incorrect: you'll have to change their
orientation by substituting the codepoints by the corresponding mirrored
character).
There are tools that perform that notably for Hebrew and Arabic: i.e.
converting texts from visual to logical encoding order. But I don't know one
that works with Word documents: so you may need to create a conversion
macro...
----- Original Message -----
From: Raymond Mercier
To: unicode@unicode.org
Sent: Sunday, May 22, 2005 6:45 PM
Subject: hebrew font conversion
[This is really a question for the Hebrew Computing Forum, but I have tried
there and drew a blank.]
The problem is that I composed many documents in Word using an ad hoc Hebrew
font, and wish to convert to Unicode.
When I run a macro that exchanges the old codepoints for the U+Hebrew
points, the characters in each word are reversed. I have tried to cure this
by writing another macro using StrReverse() . Sometimes this works, but
there are problems - especially with tables.
Does anyone have experience of this, and or/a solution ?
I will have the same problem with Arabic Word docs.
This archive was generated by hypermail 2.1.5 : Sun May 22 2005 - 13:19:48 CDT