From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 26 2007 - 04:28:46 CDT
> -----Message d'origine-----
> De : Philippe Verdy [mailto:verdy_p@wanadoo.fr]
> Envoyé : jeudi 26 juillet 2007 09:39
> À : 'Kenneth Whistler'
> Cc : 'unicode@unicode.org'
> Objet : RE: UAX#14-20: undesriable line breaking opportunities (parenthese
> and quotation marks)
>
> > And in particular, the relevant rules are:
> > (...)
> > LB30 Do not break between letters, numbers, or ordinary symbols and
> > opening or closing punctuation.
> >
> > (AL | NU) × OP
> > CL × (AL | NU)
> >
> > Those rules seem *already* to be doing exactly what you seem to
> > be asking for.
If you really think that this rules are sufficient, I still maintain that
this rule is ambiguous, and consists in fact into TWO separate rules that
are incorrectly summarized by its description (the term "between" combined
with the "or" used in "opening or closing" is the main source of confusion).
So I am suggesting to rewrite it as:
LB30.1 Do not break after letters, numbers, or ordinary symbols
and before opening punctuation.
(AL | NU) × OP
LB30.2 Do not break after closing punctuation and
before letters, numbers, or ordinary symbols.
CL × (AL | NU)
And I would add a third item speaking about punctuations that may be used
both as opening or closing punctuation, either because this is
language/locale dependant (notably quotation marks), or because they are
intrinsicly ambiguous (such as the ASCII vertical single or double quotes).
In such a case, if it can't be determined (from the character itself or from
the language effectively in use) that a punctuation is opening or closing,
then the two separate rules should BOTH apply, by making these punctuation
signs parts of the TWO line-breaking classes OP and CL.
Now, about the implementation :
* for closing punctuations it is simple to handle this case by treating it
as if they were combining characters encoded after the combining sequence
that it extends so that it is handled as if it was a larger grapheme
cluster. This should occur in all cases except after whitespaces and
explicit line-break controls (or explicit ends of verses if they are marked
as such in some scripts, such as double dandas).
* for opening punctuations, the case is a bit more difficult because it will
require an additional forward lookup to see how to handle them.
* for ambiguously opening or closing punctuations (mostly, the quotation
marks discussed above), the best way to handle them is to prohibit line
breaks BOTH before AND after them, unless the characters before or after
them are whitespaces or characters explicitly forcing a line-break or
indicating explicitly that a line break is allowed, such as a disjoiner
control.
This archive was generated by hypermail 2.1.5 : Thu Jul 26 2007 - 04:31:08 CDT