From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sun Dec 02 2007 - 11:47:56 CST
Asmus Freytag wrote:
> You seem to want a number of contradictory things.
Don't we all? But I wouldn't refer to contradiction here.
> Rule LB15 got its origin from just such an attempt to be conservative
> when in doubt, realizing that allowing a bad break can be more
> damaging than missing a break opportunity.
Yes, but it forbids a break at a space. I don't think I'm
self-contradictory in saying "be conservative in allowing breaks, but
allow breaking at a space". A space is an exception rather than
contradiction. It is based on a long writing tradition in some cultures.
> The algorithm is intended for multilingual text or for multilingual
> environments. It can therefore _not_ simply assume that spaces are
> what makes the break. Doing so, would cause very suboptimal
> typography for Asian contexts.
I'm not saying that "spaces are what makes the break". I suggested a
simple approach that combines script-specific rules, breaking at spaces,
and explicit line break controls. I don't think it's possible to get
much farther at the "multilingual" level, which really mean
language-ignorant level. (This is basically about texts in unknown
languages, "unknown" in the sense that the processing software does not
apply language-sensitive rules.) If you try harder, confusion and
problems arise.
>
> The original algorithm, before rule 15, was tested in shipping
> implementations before offering it as a seed for the standardization
> effort. It was itself based on European de-facto practice and certain
> Asian standards in the area of linebreaking.
As fas as I have understood, the Unicode line breaking rules have varied
a lot (and programs may still reflect older versions - I can almost
daily see typeset text that has incorrectly broken abc:abc after the
colon), and I have never seen any software that comes even close in
applying the Unicode rules. But I have seen software that applies _some_
of the rules. For obvious reasons, I mostly observe such things when
they produce outright wrong and mad results.
> Because a bad linebreak following an opening punctuation (or right
> before a closing punctuation) is a very serious issue in non-Western
> line layout, the UTC adopted the cautious formulation of Rule 15.
I still haven't seen a case where " (..." appears, for any quotation
marks. I don't deny the possibility of such expressions. I'm just saying
that they must be extremely rare, if not contrived, and that _they_
(rather than some much more common situations) should be handled by
language-specific exceptions to line breaking, or by a no-break space,
or by some other tools.
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sun Dec 02 2007 - 11:50:03 CST