Re: How to encode underlined characters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Sep 08 2005 - 11:01:02 CDT

  • Next message: Jukka K. Korpela: "Re: How to encode underlined characters"

    This is an interesting case. My opinion is that the simple combining macron
    is only intended to apply to a single letter, and does not extend the
    character cluster.
    So the double combining macron should be used as it not only underlines the
    previous letter, but also attaches the next combining sequence with the same
    cluster.

    But using generic sequences like
    <LETTER, combining double-width diacritic, LETTER, combining double-width
    diacritic, LETTER>
    should not imply that the second letter will be underlined twice for example
    when the double diacritic is a double-width macron below (if needed, this
    second letter can be underlined a second time by applying a simple combining
    macron below it).

    It should instead be a general mechanism that allows extending a combining
    diacritic to longer sequences. But it can create complex cases for computing
    the effective layout in renderers.

    ---
    It would have been preferable to have in Unicode a mechanism allowing to 
    delimit runs of combining sequences as an invisible cluster, to which a 
    normal diacritic would have been applied. The (badly named) "double" 
    diacritics in Unicode are a kludge.
    It would have been much cleaner with sequences logically encoded like:
    <CLUSTER BEGIN CONTROL><COMBINING SEQUENCE>*<CLUSTER END CONTROL><COMBINING 
    DIACRITIC>*
    where one or more combining sequences are surrounded by invisible controls 
    to create a super-cluster, and further diacritics can be applied to the 
    whole. For this mechanism to work correctly, the <END CLUSTER CONTROL> 
    should then be a base character (with combining class 0).
    For example, in your case, one would have encoded:
        <CLUSTER BEGIN CONTROL>aai<CLUSTER END CONTROL><combining macron below>
    And the mechanism would have been used instead of the existing combining 
    double diacritic like:
        a<combining DOUBLE macron below>i
    which could have been instead:
        <CLUSTER BEGIN CONTROL>ai<CLUSTER END CONTROL><combining macron below>
    The "bad" thing would have been the necessarily contextual rendering, but 
    after all, there are lots of contextual rendering rules in Unicode. On the 
    opposite, the interpretation is not contextual above because it is really 
    logically encoded, and such construction is easier to handle for collation 
    purpose...
    -- Philippe.
    ----- Original Message ----- 
    From: "Chris Harvey" <chris@languagegeek.com>
    To: <unicode@unicode.org>
    Sent: Thursday, September 08, 2005 4:33 PM
    Subject: How to encode underlined characters
    > Hello
    >
    > Many North American Native languages use underlined letters as part of 
    > their orthographies. This probably goes back to the use of typewriters, 
    > where a quick backspace+underscore would have been easy enough to type on 
    > the US keyboard.
    >
    > All examples below are from Shoshoni and Kwakwaka’wakw.
    >
    > Where it is only a single character which is underlined, the solution 
    > would be U+0331 COMBINING MACRON BELOW.
    > Thus ‘a̱’ (underlined ‘a’) would be U+0061 U+0331.
    >
    > In a situation where two characters make up one orthographic letter, which 
    > is underlined, one would use U+035F COMBINING DOUBLE MACRON BELOW.
    > Thus ‘a͟i’ (underlined ‘ai’) would be U+0061 U+035F U+0069
    >
    > But what about situations where three or more characters make up one 
    > orthographic letter which is underlined, such as ‘aai’ or ‘aaii’? The 
    > underline should be one long line, not three or four individual MACRON 
    > BELOWs.
    >
    > I can think of a few options.
    > a) aai (all underlined) could have two COMBINING DOUBLE MACRON BELOWs: 
    > U+0061 U+035F U+0061 U+035F U+0069
    > b) aai (all underlined) could use three COMBINING LOW LINES (U+0332): 
    > U+0061 U+0332 U+0061 U+0332 U+0069 U+0332
    >
    > Option a) seems to be more consistant with underlined ’ai’.
    > If b) is chosen, should then underlined ‘ai’ use two COMBINING LOW LINEs 
    > as well?
    >
    > Thank-you very much
    >
    > Chris Harvey
    > -- 
    > Gwlad heb iaith, gwlad heb galon
    > ᑭᑕᐢᑭᓇᐤ ᑳᓀᓱᐏᑌᐦᐃᓇᑿᐣ, ᑮᐢᐱᐣ ᐃᔨᐣᑐ ᐱᑭᐢᑵᐏᐣ ᐘᓂᑎᔭᐦᑭ
    > (A country without its language is a country without a heart)
    >
    > www.languagegeek.com
    > www.indigenous-language.org
    >
    >
    >
    > 
    


    This archive was generated by hypermail 2.1.5 : Thu Sep 08 2005 - 11:03:02 CDT