From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 26 2007 - 03:34:12 CDT
> Also, the class CM inherits from the *preceding* character. Your model
> would result in inheritance in the other direction, which would
> invalidate all existing implementations (not even those that import the
> UCD tables could update to such a scheme w/o changes in architecture).
I have NOT spoken of the CM class. I don't know why you are speaking about
it.
Te only relevant rule is LB30, but anyway if it effectively solved the
problem for the case of "word(s)" in the Latin script, the effect of LB30
will be too broad in ideographic texts.
Suppose that I1, I2, I3 are sequences of ideographs. If they occur in
sequence in such a way that line breaking is allowed between them, then we
have:
I1 ÷ I2 ÷ I3
This is the normal way to handle line breaks in ideographic texts (or other
scripts that typically don't use explicit whitespaces, so extend this
discussion to these scripts too.)
Now suppose that punctuatuon pairs (OP and CL) are used in sequence :
I1 OP I2 CL I3
The line-breaking should now be prohibited between OP and I2 and between I2
and CL, however it should not be prohibited between I1 and OP and between CL
and I3. In other words:
I1 ÷ OP × I2 × CL ÷ I3
The best way to formulate what I mean is :
Independently of the nature of the <a>, <b>, <c> or <d> characters
below, we are here just considering the characters on each side of the
opening and closing character, such that :
* a line break between <a> and <OP,b> should be allowed (resp.
prohibited) if and only if a line break would be allowed (resp. prohibited
between <a> and <b>. The sequence <OP,b> is not breakable.
* a line break between <c,CL> and <d> should be allowed (resp.
prohibited) if and only if a line break would be allowed (resp. prohibited
between <c> and <d>. The sequence <c,CL> is not breakable.
And this is what I mean when I say that:
* opening punctuation should be treated as if they were extending
the grapheme cluster of the first characters encoded after it, and
inheriting its line-breaking properties (this has NOTHING to do with the CM
class that I did not discuss).
* closing punctuation should be treated as if they were extending
the grapheme cluster of the first characters encoded after it, and
inheriting its line-breaking properties (this has NOTHING to do with the CM
class that I did not discuss).
These rules are a bit different from LB30, and I think more appropriate
because they will work in the ideographic context. I suspect that if LB30 is
not implemented, it's because it did not work correctly with these texts.
This archive was generated by hypermail 2.1.5 : Thu Jul 26 2007 - 03:36:06 CDT