Re: hebrew font conversion

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 23 2005 - 14:17:56 CDT

  • Next message: Philippe Verdy: "Re: hebrew font conversion"

    From: "Peter Constable" <petercon@microsoft.com>
    > In a table, I presume you'd need to reverse the string in each cell one
    > at a time. Actually, I'd expect your biggest problem is that you need to
    > reverse formatted *lines* of text one at a time.

    Isn't that too much simplist? Actually, text also contains characters with
    weak directionality like punctuation, or with directionality that does not
    depend on the script used for text, like digits, and symbols...
    The only accurate thing to do is to parse the line to see where the Bidi
    algorithm reverses the visual order, and then reorder those grapheme
    clusters (it's a bit more complicate because the orderering of characters
    that make the same combining sequence must NOT be changed: each combining
    sequence must be kept as a unit).
    I think that this conversion process should be described in the UAX that
    speaks about the BiDi algorithm, because it's a so common operation to
    convert between "visual" order (i.e. the default order for ISO-8859-8 for
    Hebrew, including on the web unless pages specified explicitly that this
    text uses the "implicit" order with a different charset label, but not for
    the ISO-8859 Arabic) and the logical order.

    Subsidiary question: what is the default ordering of the Latin-Hebrew ANSI
    codepage in Windows? Is it like ISO-8859-8 (i.e. LTR always including for
    Hebrew letters) or like Unicode (using the BiDi algorithm with RTL hebrew
    letters)?
    It's insteresting to note that this affects the encoding mappings, such as
    those in the UNIDATA directory. It's not clear in that repository if those
    charsets that include Semitic letters map them correctly.

    Shouldn't there exist mappings to PUAs instead (because these mappings use
    LTR Hebrew letters), or a documented marker in those mapping files that
    indicate that the charset does not support the BiDi algorithm, and so uses
    LTR throughout the mapped charset (meaning that the conversion to Unicode
    requires either inserting a leading LRO control and a trailing PDF control,
    or either reordering the characters)?



    This archive was generated by hypermail 2.1.5 : Mon May 23 2005 - 15:51:28 CDT