From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Dec 18 2009 - 18:14:12 CST
On 12/18/2009 2:59 PM, Eric Muller wrote:
> On 12/17/2009 4:37 AM, Karl Pentzlin wrote:
>> As the mentioned Wikipedia entries explain, there is no ligature
>> allowed across constituents of composite words, which are common in
>> German.
>> Thus, in "Affe" (monkey) a ff ligature is to be applied (although
>> the word division Af-
>> fe is correct), while in "Schaffell" (fleece of sheep, composed of
>> "Schaf" sheep + "Fell" fleece) no ff ligature is allowed.
>>
>
> Is there easily accessible code+data that determines, given a word,
> the points where ligatures are allowed/not allowed?
No.
>
> Is the problem amenable to a pattern-based implementation, similar to
> hyphenation patterns?
No. It's strictly speaking impossible.
To give just one example:
Wachs + tube
Wach + stube
both have the same letters. One may have the st ligature, the other not.
(Both would have a mandatory ch ligature, if typeset in Fraktur).
The meaning of the two compounds are utterly unrelated, and they are
pronounced differently. If you ligate incorrectly, your use of ligation
would clash with the meaning of the word as predicted by context.
The effect is possibly a bit more subtle than an overt typo because most
readers don't make conscious note of ligatures. However, due to the fact
that component boundaries are not marked, experienced readers probably
respond to such cues subconsciously when faced with the need to "take
apart" (i.e. analyze) a compound. If a compound is unusual (or even
ambiguous, as in the above example) correct non-ligation is a helpful clue.
>
> Do the no-ligature points correspond to hyphenation points where a
> spelling change is required according to traditional orthography?
A spelling change is (was) required for "ck", for example, when at at a
syllable boundary. Nothing prevents ligatures at an ordinary syllable
boundary, only at those points where you would (hypothetically) insert a
space if you were to write out the components of a compound word
separately (indicated by a "+" in the examples above).
Hyphenation is also strictly speaking impossible w/o reference to
meaning. The example above can be hyphenated at the "+" and also in
front of the "be". The same string of letters has two different
hyphenation depending on the intended meaning.
Hyphenation has other issues that very much complicate pattern analysis.
The standard example in German is
Urinstinkt
"Ur" is a common prefix, meaning ancestral or original, and, like so
many prefixes, can be separated by a hyphen.
Instinkt (instinct) can normally be hyphenated after the "In". However,
when you put everything together, you should disallow that hyphenation,
because otherwise you get
Urin-
stinkt
(which, except for the hyphen, reads very much like "urine stinks").
For hyphenation you can handle those issues by a list of exceptional
words. But nothing helps you with ambiguous compounds like "Wachstube".
(Quick: which sense of it did I intend in the previous sentence? Right -
there's no context, so only I, as the author know. Given a font with st
ligation, I could have communicated my choice.)
A./
>
>
This archive was generated by hypermail 2.1.5 : Fri Dec 18 2009 - 18:16:38 CST