Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Apr 23 2006 - 13:41:30 CST

  • Next message: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"

    Karl Pentzlin wrote on Sunday, April 23, 2006 at 6:30 PM
    Subject: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or
    U+0184/U+0185

    > According to the sources:
    > http://en.wikipedia.org/wiki/Janalif
    > http://en.wikipedia.org/wiki/Uniform_Turkic_alphabet
    > http://www.omniglot.com/writing/azeri.htm
    > there is a Latin letter after the "i" in the Pan-Turkic alphabet looking
    > like Cyrillic U+042C/U+044C (soft sign) which has the function of the
    > dotless i in modern Turkish, Azeri and Tatar.
    >
    > This letter is not encoded in Unicode as such.
    >
    > Michael Everson states in "Some Türkmen alphabets"
    > http://www.evertype.com/standards/iso10646/pdf/turkmen.pdf :
    > Latin ?? are not encoded in the UCS, complicating things like
    > monolingual multiscript ordering since the current UCS expects Cyrillic ??
    > to do double duty. There are lots of Asian and Caucasian languages using
    > this particular pair in multiple scripts.
    >
    > --
    > 1. Is there a specific reason not to encode that letter,

    It can be justified by the principal of separation of scripts, as
    exemplified by the distinction of Latin 'o' and its Greek and Cyrillic
    counterparts.

    > especially
    > as its similarity to the Cyrillic soft sign is only superficial
    > (it is no soft sign - the "functionally next similar" Cyrillic
    > letter is U+0428 ? not U+042C ?)

    You clearly don't know Church Slavonic :) Seriously, though, the soft and
    hard signs originally functioned as short vowels /i/ and /u/, which were
    then mostly lost as the Slavonic languages developed. Thus to an English
    speaker, it actually seems extremely appropriate!

    > - as if not to encode Latin U+0058/U+0078 Xx as you can use
    > Cyrillic U+0425/U+0445 ?? instead.

    Do some research in the epichoric Greek alphabets, and you'll find your
    suggestion is not as daft as it sounds. Some cities used the letter for
    /kh/, some for /ks/. See
    http://luna.cas.usf.edu/~murray/classes/cg/alphabet.htm , for example.
    Again, though, the principle of script separation avoids confusion.

    > 2. The Latin letters U+0184/U+0185 LATIN capital/small LETTER TONE SIX
    > look very similar (except that the reference glyphs have a little
    > left-pointing triangle at their top instead of a serif).
    > Would it be a reasonable idea to unify the missing Latin letters ??
    > with these?

    At first sight that seems totally crazy, but I think it is actually
    reasonable. These letters in the obsolete Zhuang writing system are
    actually based on the digit '6'. However, it was considered wrong to use
    the digit '6', and therefore the similarly shaped but significantly distinct
    Cyrillic letter, soft yer, was used. Just look at the comments on it in
    TUS - http://www.unicode.org/charts/PDF/U0180.pdf ! Key questions for this
    unification would be:

    1) Are the glyphs too distinct? - 'Considerable variation is to be expected
    in actual fonts.'
    2) Are we to think of letter tone six as soft jer transferred to the Latin
    script?

    The other question is, just how much trouble is caused by using the Cyrillic
    soft jer as a Latin letter - there will be a disunification cost. The
    sorting issue can get one short, sharp reply - Tailor your collation! Does
    the Cyrillic soft yer occur as in 1-letter words in both scripts? If not,
    tailoring can sense the script by contracting with an adjacent letter.
    (Straight Russian can probably reasonably take pot luck.)

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Apr 23 2006 - 13:46:18 CST