RE: German 0364 COMBINING LATIN SMALL LETTER E

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Mon Dec 29 2003 - 10:13:47 EST

  • Next message: Philippe Verdy: "Non-characters in Unicode data files"

    Philippe Verdy wrote:

    > I wonder, when looking at the Sütterlin font, if it is not a script variant
    > of its own, where in German the "umlaut" (diaeresis) and the "combining
    > Latin small letter e" would be in fact the same diacritic. What's your
    > opinion about this?

    The German and Nordic use of diaeresis is in origin an overscript
    small e. (A font for Sutterlin, a handwriting style really, must cover
    combining small e above.)

    (Don't know about the origin of the French use of diaeresis.)

    > Is the canonical decomposition of a+umlaut, o+umlaut, u+umlaut
    > better represented in German as meaning really a+combiningSmallLetterE,
    > o+combiningSmallLetterE, u+combiningSmallLetterE,

    No. Don't confuse historic origin with current orthography.

    Texts may use <a, c. diaeresis> as well as <a, c. small e above>
    in the same text, even the same font (and there are (old) documents
    that do so, even though they may use these characters interchangeably).
    It is up to the author to decide which to use, not the font designer.
    (A diaresis should never look like a small e above. That includes
    fraktur fonts. Note that both diaresis and small e above is used
    with fraktur, and it should be the author that desides which to
    use. And one cannot rely on fragile font selections.)

    > and matching the German collation of
    > a-diaeresis, o-diaresis and u-diaresis with a+e, o+e, u+e?

    <a, c. diaresis> and <a, c. e above> (and æ) should collate the same
    at level 1 at least for German and the Nordic languages. (a+e is something
    else though, and collate as ä *only* for *one* of the variants of German
    collation, where ä is collated as a+e.)

    > For the same reason, why is the German "ess-tsett" (sharp S) given a
    > compatibility decomposition as <s><s> instead of <long-s><s>?

    Don't know. But there are instances of sharp s (ß) that look like a ligated
    long-s (ſ) and ezh (ʒ). But (compatibility!!) decomposing it that way
    would be inappropriate. And this case is quite different from the
    cases above, since there is NO canonical decomposition. However,
    having the compatibility decomposition mapping of ß as <ſ, s>
    would not break any stability policy (at least); the normal forms
    would not change.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Mon Dec 29 2003 - 11:15:10 EST