Re: SOFT HYPHEN

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Tue Nov 16 1999 - 08:42:40 EST


Klaus Weide wrote on 1999-11-10 12:27 UTC:
> I asume you are familiar with the dissenting treatise at
>
> http://www.hut.fi/~jkorpela/shy.html

I wasn't familiar with it, but a quick look at it tells me that I wish I
had written it myself. I like it very much and fully agree with Jukka
Korpela's advice and conclusion.

I really do believe that

  - HTML documents should never contain soft hyphens. HTML formats
    are unformatted and therefore should not contain characters such as
    SHY that can only have been inserted as the result of a paragraph
    formatting process.

  - If people feel a real need to have control characters inside the text
    that control hyphenation, then they could introduce a new ZERO WIDTH
    HYPHENATION POINT, which would have a similar semantic as \-
    under TeX (marking an explicit hyphenation opportunity in this word,
    preferably also suppressing at the same time any implicit hyphenation
    points that the hyphenation algorithm would otherwise provide).

    May be there could be even both ZERO WIDTH HYPHENATION POINT and ZERO
    WIDTH EXCLUSIVE HYPHENATION POINT, depending on whether its presence
    is disabling the normal hyphenation algorithm for the remaining word
    or not. (See also the \- in TeX versus the "- in the German.TeX
    macro package, the latter of which is non-exclusive.)

  - Inserting hyphenation points directly into a document in the
    running text is usually a very bad idea, because it does not aid in
    allowing to reformat the text later, it leads to inconsistent hyphenation
    across a document, and it complicates search/replace algorithms.
    The right solution is to allow the user to add to the document an
    extension or exception list of the hyphenation dictionary for all
    those words for which the default hyphenation algorithm leads to
    unsatisfactory results. Similar to TeX's \hyphenation{Do-nau-dampf-
    schiff-fahrt} command, which makes sure in the header that this
    remarkably long word will be hyphenated correctly everywhere (!)
    in the document, no matter how often it appears.

So I somewhat don't like the idea of adding a ZERO WIDTH {EXCLUSIVE}
HYPHENATION POINT to Unicode, because implementing it would probably be
abused as an excuse for not adding the only proper solution (hyphenation
exception lists). But even more I dislike the idea of simply abusing SHY
as an ill-defined ZERO WIDTH (EXCLUSIVE?) HYPHENATION POINT. See HTML. Yuck.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT