Combining umlauts (e.g. ü over a base letter)

From: Karl Pentzlin (karl-pentzlin@acssoft.de)
Date: Sat Feb 23 2008 - 05:16:55 CST

  • Next message: Asmus Freytag: "Re: Combining umlauts (e.g. ü over a base letter)"

    Am Samstag, 23. Februar 2008 um 01:51 schrieb Asmus Freytag
     (Re: i with macron over an e - Do U+0365 and U+2071 lose their dot
     when accented like U+0069?):

    AF> ... The reason for that
    AF> is that in Unicode, you can't apply a diacritic to a diacritic, you can
    AF> only apply a diacritic to a sequence.
    AF> ... A macron applied to a sequence of <e , combining dotless i> should be
    AF> rendered as if it applied to the whole.

    This seems, as far as I know until now, sufficient for the e +
    combining i + macron, as it is used to denote lenght for the vowel
    denoted by e + combining i.

    But, how should combining umlauts (e.g. ü over an o, as the entity marked
    in red in the attached scan) be handled?

    o + combining u + trema: U+006F U+0367 U+0308 thus does not yield an
    o + subscript ü, but an o + subscript u + a trema above of that
    combination, clearly too wide to be recognized as an umlaut marker
    for the subscript ü.

    Which of the possible solutions is to be preferred (assuming that
    there is clear evidence presented for a superscript ü):

    1. Encode a COMBINING LATIN SMALL LETTER U UMLAUT
       (which implies that such a letter is not considered as precomposed,
        as there is no obvious decomposition now - U+0367 U+0308 does not
        apply)
    2. Encode a COMBINING SMALL DIARESIS (or COMBINING SUPERSCRIPT
        DIARESIS) with an informative note:
        · suited for combinations with combining letters, e.g. to mark
          them as umlaut
    3. Expand the semantics of ZWJ/ZWNJ in a way
       - that U+006F U+0367 ZWJ U+0308 yields the wanted entity,
       - that ZWNJ after such entities "switches back" to the application
         of subsequential diacritics to the whole entity.
    4. something completely different.

    I prefer 2. as it handles this case without inventing any new
    mechanism and also enables superscript ö/ä with a single new
    character, and does not raise any questions about precomposedness of
    combining letters.

    Any suggestions or opinions?
    - Karl Pentzlin



    modifier_letter_u-umlaut.png

    This archive was generated by hypermail 2.1.5 : Sat Feb 23 2008 - 05:21:22 CST