RE: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 26 2007 - 03:34:12 CDT

  • Next message: Philippe Verdy: "RE: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)"

    > Also, the class CM inherits from the *preceding* character. Your model
    > would result in inheritance in the other direction, which would
    > invalidate all existing implementations (not even those that import the
    > UCD tables could update to such a scheme w/o changes in architecture).

    I have NOT spoken of the CM class. I don't know why you are speaking about
    it.

    Te only relevant rule is LB30, but anyway if it effectively solved the
    problem for the case of "word(s)" in the Latin script, the effect of LB30
    will be too broad in ideographic texts.

    Suppose that I1, I2, I3 are sequences of ideographs. If they occur in
    sequence in such a way that line breaking is allowed between them, then we
    have:

            I1 ÷ I2 ÷ I3

    This is the normal way to handle line breaks in ideographic texts (or other
    scripts that typically don't use explicit whitespaces, so extend this
    discussion to these scripts too.)

    Now suppose that punctuatuon pairs (OP and CL) are used in sequence :
            I1 OP I2 CL I3
    The line-breaking should now be prohibited between OP and I2 and between I2
    and CL, however it should not be prohibited between I1 and OP and between CL
    and I3. In other words:
            I1 ÷ OP × I2 × CL ÷ I3

    The best way to formulate what I mean is :

            Independently of the nature of the <a>, <b>, <c> or <d> characters
    below, we are here just considering the characters on each side of the
    opening and closing character, such that :
            * a line break between <a> and <OP,b> should be allowed (resp.
    prohibited) if and only if a line break would be allowed (resp. prohibited
    between <a> and <b>. The sequence <OP,b> is not breakable.
            * a line break between <c,CL> and <d> should be allowed (resp.
    prohibited) if and only if a line break would be allowed (resp. prohibited
    between <c> and <d>. The sequence <c,CL> is not breakable.

    And this is what I mean when I say that:
            * opening punctuation should be treated as if they were extending
    the grapheme cluster of the first characters encoded after it, and
    inheriting its line-breaking properties (this has NOTHING to do with the CM
    class that I did not discuss).
            * closing punctuation should be treated as if they were extending
    the grapheme cluster of the first characters encoded after it, and
    inheriting its line-breaking properties (this has NOTHING to do with the CM
    class that I did not discuss).

    These rules are a bit different from LB30, and I think more appropriate
    because they will work in the ideographic context. I suspect that if LB30 is
    not implemented, it's because it did not work correctly with these texts.



    This archive was generated by hypermail 2.1.5 : Thu Jul 26 2007 - 03:36:06 CDT