From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 26 2007 - 02:25:05 CDT
Asmus Freytag wrote:
> With all changes to UAX#14, it is important to make sure that existing
> implementations can continue to be conformant as much as possible,
I've never wanted to propose a change that would make existing
implementation non-conforming.
I just signal the common fact that in very common cases like optional
suffixes/prefixes/infixes between parentheses attached to a word, breaking
it by default around the parentheses is completely undesirable (and infact
not needed, given that parentheses just need to break a line at their
nearest word-separation whitespace (the only exception being for scripts
like Han that allow line breaking between most ideographs, and where no
space will be present before the opening parenthese or after the closing
one).
> Your furhter
> statements show that you haven't fully understood some of the core
> concepts of the way the default line breaking algorithm is intended to
> work.
Please avoid your putative statement about my understanding of the line
breaking algorithm. That's not necessary, and you are trying to insult me
with such things. I can read things, but if I need to specify everything for
a desired changed, I would not need to discuss it here.
In fact the most important thing is that I indicate a real problem, that is
currently not handled, and some suggestions must be made to solve it, before
an effective technical specification can handle it.
I have not suggested anything in such a way that would break the line
breaking algorithm. (Note anyway that if the existing algorithm already
inserts a line-breaks within a single word that just appears to embed
parentheses, then something must be changed if this line breaking is
undesirable.
I have just not suggested the effective technical rule needed to handle the
case of undesirable line breakings around punctuation pairs like parentheses
and quotation marks.
Please reconsider this problem with the VERY COMMON example of optional
plural forms like in:
"one or more word(s)"
And consider the equivalent cases that DO occur in almost all Latin-written
languages (I gave examples in French, I prove that this case exists too in
English, it's very easy to find many examples in German, Spanish or Italian,
and in fact this is not restricted to Latin-based European languages, and
you'll find the same cases in Greek, Russian, Hebrew...)
Then consider the various discussions that have happened here about the
transciptions of Hebrew into Latin: the parentheses are commonly used in the
middle of a word to surround missing/implied/deleted/optional letters.
Look at the many discussion in this list where characters within the same
word need to be transcribed using some notation between parenthese-like
pairs...
I am just demonstrating that this case is VERY FREQUENT but still NOT
HANDLED correctly in MOST cases.
For this reason, in many web pages if we want to avoid undesired line breaks
in narrow table columns, we currently have to surround these unbreakable
words with CSS style like:
<html>
...
One (or more) <span style="word-space:nowrap">word(s)</span>
...
</html>
The fact that we have to do this BREAKS the separation model between the
content and the style. And this is not always possible in many cases where
the text is assembled from several sources that are NOT HTML-encoded. And
this won't work with texts in XML sources that don't have any style option
in the schema of their data-model.
So please don't rant against me, I've been polite for now. What I am
indicating is not a very special case, and the line breaking algorithm
should be able to manage the most frequent cases. This case with parentheses
is VERY FREQUENT.
This archive was generated by hypermail 2.1.5 : Thu Jul 26 2007 - 02:28:49 CDT