From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 23 2005 - 14:17:56 CDT
From: "Peter Constable" <petercon@microsoft.com>
> In a table, I presume you'd need to reverse the string in each cell one
> at a time. Actually, I'd expect your biggest problem is that you need to
> reverse formatted *lines* of text one at a time.
Isn't that too much simplist? Actually, text also contains characters with
weak directionality like punctuation, or with directionality that does not
depend on the script used for text, like digits, and symbols...
The only accurate thing to do is to parse the line to see where the Bidi
algorithm reverses the visual order, and then reorder those grapheme
clusters (it's a bit more complicate because the orderering of characters
that make the same combining sequence must NOT be changed: each combining
sequence must be kept as a unit).
I think that this conversion process should be described in the UAX that
speaks about the BiDi algorithm, because it's a so common operation to
convert between "visual" order (i.e. the default order for ISO-8859-8 for
Hebrew, including on the web unless pages specified explicitly that this
text uses the "implicit" order with a different charset label, but not for
the ISO-8859 Arabic) and the logical order.
Subsidiary question: what is the default ordering of the Latin-Hebrew ANSI
codepage in Windows? Is it like ISO-8859-8 (i.e. LTR always including for
Hebrew letters) or like Unicode (using the BiDi algorithm with RTL hebrew
letters)?
It's insteresting to note that this affects the encoding mappings, such as
those in the UNIDATA directory. It's not clear in that repository if those
charsets that include Semitic letters map them correctly.
Shouldn't there exist mappings to PUAs instead (because these mappings use
LTR Hebrew letters), or a documented marker in those mapping files that
indicate that the charset does not support the BiDi algorithm, and so uses
LTR throughout the mapped charset (meaning that the conversion to Unicode
requires either inserting a leading LRO control and a trailing PDF control,
or either reordering the characters)?
This archive was generated by hypermail 2.1.5 : Mon May 23 2005 - 15:51:28 CDT