Re: How to encode underlined characters

From: Chris Harvey (chris@languagegeek.com)
Date: Mon Sep 12 2005 - 00:54:04 CDT

  • Next message: Richard Wordingham: "Re: Languages supported by UTF8 and UTF16"

    Ysgrifennodd Doug Ewell <dewell@adelphia.net> ar y 12-09-2005 am 01:26:
    > Can you scan actual examples of Carrier and Shoshon{i,e} printed text,
    > so we can see which is preferred for each language, before encoding them
    > differently?

    For Carrier, I’ve seen both the MACRON BELOW and the LOW LINE. I can
    pretty much guarantee that the instances of the LOW LINE are the result of
    underline formatting by the word processor. Shoshoni always shows up as
    the LOW LINE.

    Assuming that these orthographies were developed for use on a US
    typewriter, we could say that all of the Native North American
    orthographies which underline a letter should use a COMBINING LOW LINE.
    But consider the situation in Tsimshian:

    Underlined: a aa g k

    In this case, the COMBINING LOW LINE wouldn’t work because the Tsimshian
    underline goes under the ‘g’, not through the stem. Thus for this
    language, it would be best to use the COMBINING MACRON BELOW for a g k (ḵ
    also has a precomposed Unicode character), and the DOUBLE MACRON BELOW for
    aa.

    Many languages underline the g (Tlingit, Gitsenimx̱, Haida, Kwakwala, to
    name a few). This would rule out the LOW LINE for these languages also. To
    aviod confusion, I would tentatively suggest something like only using the
    LOW LINE in cases where three or more characters in a row can receive the
    underlining. MACRON BELOW for languages which underline only one
    character, and MACRON BELOW + DOUBLE MACRON BELOW for languages which can
    underline up to two characters. Otherwise, I think we might be left with:
    different languages are encoded differently.

    Chris Harvey

    -- 
    Gwlad heb iaith, gwlad heb galon
    ᑭᑕᐢᑭᓇᐤ ᑳᓀᓱᐏᑌᐦᐃᓇᑿᐣ, ᑮᐢᐱᐣ ᐃᔨᐣᑐ ᐱᑭᐢᑵᐏᐣ ᐘᓂᑎᔭᐦᑭ
    (A country without its language is a country without a heart)
    www.languagegeek.com
    www.indigenous-language.org
    


    This archive was generated by hypermail 2.1.5 : Mon Sep 12 2005 - 00:54:52 CDT