RE: UAX #14: no line breaks between OP and QU, even if there are intervening spaces

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 30 2007 - 20:04:51 CST


> -----Message d'origine-----
> De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
> part de Jukka K. Korpela
> Envoyé : vendredi 30 novembre 2007 17:47
> À : unicode@unicode.org
> Objet : Re: UAX #14: no line breaks between OP and QU, even if there are
> intervening spaces
>
> Arnt Richard Johansen wrote:
>
> > In UAX #14, rule LB15 states "Do not break within '"[', even with
> > intervening spaces." This is formalised as
> >
> > QU SP* × OP
> >
> > What is the rationale behind this rule?
>
> Beats me. Whatever the rationale might be, the rule is harmful more
> often than useful. I'm afraid the line breaking rules as a whole just
> try too much: they define detailed rules for combinations, based on the
> consideration of some _possible_ scenarious where the combinations might
> appear.
>
> > As an example, given a sufficiently small text area width, the
> > algorithm will break text this way:
> >
> > "The
> > Wire" (2005)
> >
> > but never this way:
> >
> > "The Wire"
> > (2005)
> >
> > which is IMHO more logical.

This is correct in my opinion. "QU" is just the *default* word-breaking
class for quotes where it is not clear that they are opening or closing.
Implementations are free to remap characters from "QU" to "OP" or "CL" by
using tailored algorithms (such as with a simple pairing rule, the most
common rule used).

But the Unicode standard cannot define two conditional properties for the QU
characters. How the "QU" should behave by default, in absence of other
algorithm, is to behave like indicated.

May be, the Unicode annex should provide a standard (but still optional)
algorithm that allows remapping "QU" characters into "OP" and "CL", for the
simple pairing rule. This rule could include exceptions that preserves the
QU class in some cases, instead of mapping them into "OP" and "CL".

And even if such standard remapping is not added to the annex, the annex
should document where, in the existing algorithm, such optional remapping
should occur, so that compliant tailorings will not break everything.



This archive was generated by hypermail 2.1.5 : Fri Nov 30 2007 - 23:28:45 CST