RE: Latin ligatures and Unicode

From: Marco.Cimarosti@icl.com
Date: Mon Dec 27 1999 - 12:38:56 EST


Otto Stolz made this example, that Eberhard Pehlemann repeated:
>The two german words
>- Wachstube (wax tube)
>- Wachstube (guard's room)
>look exactly the same when written in latin and without use of ligatures.
But they >have a different meaning.
>
>Written in Fraktur and using the (compulsory) Fraktur ligatures, we have
>- Wa{ch}s-tu-be (wax tube)
>- Wa{ch}-{<U+017F>t}u-be (guard's room) (<U+017F> is the long s)
>where curly braces denote ligatures and the hyphen denotes possible line
breaks.

I too think that Michael Everson's ZWL is needed (although I would see it
unified with ZWJ) but, IMVHO, the German case above is a not good usage
example for the new control.

Eberhard correctly says that these ligatures are *compulsory* in Fraktur. If
they are compulsory, then they must be automatic: whenever a "c" is
*immediately* followed by an "h", a "ch" ligature should be used; whenever a
"s" is *immediately* followed by a "t", a "<long s>t" ligature should be
used.

No ZWL is *necessary* in these cases: the ligatures are the "default" for
that font, so the users should simply have them or indicate otherwise (using
ZWNL!).

The problem in "Wachs/tube" is another, and the incorrect rendering in
Fraktur is rather more a symptom than the problem itself. The "s" and "t"
are *not* really adjacent, because they are separated by the invisible
*boundary* that I indicated with "/".

Call it what you like: "word boundary", "morpheme boundary", "lexeme
boundary": however, it is the position of this boundary that makes
"Wachs/tube" different from "Wach/stube", not the presence or the absence of
a ligature in Fraktur.

There already is a way to show word boundaries in situations where none
should be displayed or printed: use an invisible word separator: U+200B,
ZERO WIDTH SPACE.

This information, of course, is redundant for human readers (who are
intelligent enough to understand that it is very unlikely that "the soldiers
were shooting from inside their wax tubes"), but it is required to help the
stupid computers in several different kinds of automatic text processing,
e.g.:
- Hyphenators, that must decide to insert a break before or after the "s";
- Automatic translators, that must decide between the "wax tube" or "guards'
room" transalations;
- Speech synthesizers, that must decide if saying "vaks-too-beh" or
"vakh-shtoo-beh";
- And, last but not least, display engines that must decide which glyphs to
use.

Michael gave better evidence for the need of a ZWL. Although he had to
resort to ancient scripts like Runic, he demonstrated that, in some cases,
ligatures may not be defined by any kind of rule, being simply a free
decision by the author. In these cases, there must be a way to encode this
scribal caprices, or the software will have no other choice than using a
"default" or "best-fit" rendering.

Once again, have a great 2000.
        Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT