From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Jan 25 2007 - 12:28:47 CST
Hmm. this looks like we need to address the sequence <SHY, HYPHEN> as
well as perhaps rules on language-dependent interpretation of HYPHEN
with or without SHY in UAX#14.
The way the default algorithm works, getting a hyphen to not break when
preceded by a SHY adds a certain amount of complexity (perhaps not if
your engine is based on full regex support, but for other types of
implementations).
The best approach, in the context of UAX#14 might be to simply defer
handling this to a stage right *after* the decision has been made to
actually break a line a a line break opportunity given by a hyphen.
(post-processing).
Hyphen-based line break opportunities should be post-processed anyway,
since a layout that avoids them when possible may be considered
preferable. Not to mention examples like a -a or -b suffix, where you
don't want to break before the single letter, even though on the
character level it's legal.
I think I'm going to add a section on Hyphen to the UAX#14 draft. This
would be a good opportunity to flush out any other unusual behavior of
that character.
A./
On 1/25/2007 6:26 AM, António Martins-Tuválkin wrote:
> On 2007/1/23, Adam Twardoch <list.adam@twardoch.com> wrote:
>
>> On a related note: in Polish typesetting practice, hard
>> hyphens are always promoted to the next line if soft
>> hyphens occur in the text. So if I have a sentence "Tam
>> wisi czerwono-niebieska flaga" and the optimal line
>> break occurs where the hard hyphen already exists,
>> the text will be hyphenated like this:
>>
>> Tam wisi czerwono-
>> -niebieska flaga.
>
> That's exactly what we do in Portuguese; and we do use a lot of
> hyphens, which are mandatory for half the verbs forms including a
> pronoun.
>
> Skilled typesetters and wp users routinely type *each and every
> hyphen* as a sequence of <soft hyphen> <hard hyphen>, which behave as
> expected in MS Word, InDesign, PageMaker and QuarkExpress (at least).
> The golden rule is «Never type a regular hyphen in Portuguese». Bolder
> types (pun intended) apply this practice when typing other languages,
> too.
>
> Of course unskilled typesetters and wp users (which account for 99,9%
> of everybody sitting in front of a keyboard) use regular hypens and
> even resort to <hyphen> <space> <hyphen> to force the intended
> behaviour, which come out very lame should the pargraph reflow — a
> ususal sight even in newspapaers and books.
>
> This is especially unfortunate since a homography and ambiguity may
> arise: E.g., "_disparate_" means "folly" while "_dispara-te_" means
> "fire yourself" (or "fires onto you"). The correct way to translineate
> is:
>
> «Lorem ipsum dolor sit amet, disparate consectetur adipisicing»
>
> identical to
>
> «Lorem ipsum dolor sit amet, dispara-
> te consectetur adipisicing.»
>
> And
>
> «Lorem ipsum dolor sit amet, dispara-te consectetur adipisicing»
>
> identical to
>
> «Lorem ipsum dolor sit amet, dispara-
> -te consectetur adipisicing.»
>
> Use of regular hypen yields the same result for both originals,
> leaving the reader to wonder wheather "_disparate_" or "_dispara-te_"
> is intended. Should <hyphen> <space> <hyphen> be inserted in order to
> force the expected behaviour, a paragraph reflow made later on will
> result in:
>
> «id est laborum. Lorem ipsum dolor sit amet,
> dispara- -te consectetur adipisicing.»
>
> P.S.: This is not anymore about «The "double hyphen" I discuss here
> consists of two stacked dashes».
>
> --
> António Martins-Tuválkin
> <antonio(a)tuvalkin.web.pt>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 12:31:25 CST