From: verdy_p (verdy_p@wanadoo.fr)
Date: Mon Dec 14 2009 - 20:17:52 CST
"John H. Jenkins"
> The Latin ligatures that are already there are for round-trip compatibility *only*.
NOT *only*. There are ligatures that were encoded because they are considered as unbreakable letters in some
languages or as unbreakable symbols. In which case they are treated as distinct.
See æ (from an old ligature of "ae"), Æ (from an old ligature of "AE"), œ (from an old ligature of "oe"), Œ (from an
old ligature of "OE"), & (from an old ligature of "et"), ß (from an old ligature of "ſs" or "ſz").
For these, you cannot convert them to letter pairs, not even when using ZWJ between them.
On the opposite, I'm not sure that "ij" and "IJ" are completely unbreakable (even modern Dutch today consider them to
be breakable and representable as letter pairs (with the ZWJ ligature hint), given that it has become widespread to
write Dutch words without them.
On fact you may also consider the German letters "ä", "ö", "ü" (with "Umlaut") also as modern ligatures (of "ae",
"oe" or "ue"): the German Umlaut does not really share its identity with the dieresis which has a very different
origin and meaning.
But if you consider medieval texts, you will also have to consider the case of accents : in many cases, letters with
accents were ligated forms, originating from abbreviation conventions: the accents progressively evolved from
abbreviation marks used when some letters could be easily omitted (not essential for reading) as they had become
almost mute or were slightly reduced in length or had merged in the phonology with previous letters whose phonology
has evolved (such as modification of length or value or stress).
Many of these accents were still kept for etymological reasons, untel thy evolved as distinct marks for the newer
phonology. And their link to etymology became less evident or simply wrong: it was no longer possible to decompose
letters with accents into letter pairs, so the accents became distinctive in the alphabets in which they are now
used.
The same is true for almost all other Latin diacritics (including those attached below the letters like the cedilla
or written in overlay like the solidus). In Medieval texts, many of them will be found in various places as
abbreviation marks similar to the tilde (and on varying positions depending on authors or publishers: above the
letter, across it, attached to the left, without precise rules): it will be difficult to decide if they are true
ligatures or if they are noting a new letter. For this reason, the medieval abbreviation marks and ligatures should
be encoded specifically (and using ZWJ will not be a solution as it is a too weak indicator, just an hint)
This archive was generated by hypermail 2.1.5 : Mon Dec 14 2009 - 20:19:24 CST