Re: mixed-script writing systems

From: Keld Jørn Simonsen (
Date: Sat Nov 16 2002 - 14:03:32 EST

  • Next message: Werner LEMBERG: "baseline rule"

    Hmm, one way forward could be to add the 4 letters in question to the
    Latin script. There are examples of an analogue to this, namely adding
    Latin letters to the Cyrillic script.

    Best regards

    On Fri, Nov 15, 2002 at 11:17:57AM -0600, wrote:
    > One of the Unicode design principles is unification: "unify across
    > languages, but not across scripts". As a result, the "A" used in all
    > Latin-based writing systems is the same character, but that character is
    > different from the "A" used in Cyrillic- or Greek-based writing systems.
    > There are a very small number of cases of truly ecclectic writing systems:
    > the IPA transcription system uses mostly Latin characters, but also uses a
    > few Greek characters, and Japanese writing mixes three scripts (complete
    > repertoires of two scripts, and a large portion of a third script). (One
    > might debate whether we should describe Japanese writing in terms of a
    > single writing system involving three scripts, or simultaneous use of three
    > writing systems. I have been inclined toward the former, but that's another
    > topic.) Of course, digits and punctuation get shared, but the norm is that
    > a writing system for a given language is based on a single script, and IPA
    > and Japanese are clearly exceptions.
    > That intro may well spawn a number of sub-threads, but I'm interested in
    > only one question. It has to do with an Asian language, Wakhi
    > ( This is spoken in
    > Afghanistan, China, Pakistan and Tajikistan (reportedly, similar
    > populations in each country). I don't know if the same writing system is
    > used in all countries, but there is at least one writing system, which is
    > Latin-based. (There appears also to be a distinct Cyrillic-based writing
    > system in use.)
    > What is unusual about this Latin-based writing system is that its creators
    > (I don't know who they were) were a little bit ecclectic: whereas most of
    > the characters are from the Latin script, it also uses three Greek
    > characters and one Cyrillic character: gamma, delta, theta, and Cyrillic
    > yeru (U+042B, U+044B). I've attached a GIF showing a sample of a page from
    > a publication showing all four of these characters (though not both upper
    > and lower case; note that the gamma is also used with combining caron to
    > create another grapheme).
    > (The gamma is designed like the Greek gamma, U+0393 / U+03B3, and not the
    > Latin gamma, U+0194 / U+0263. Also, it uses an ezh, which could possible be
    > represented as the Cyrillic characters "Abkhasian Dze / dze" U+04E0 /
    > U+04E1, but given that the vast majority of characters are Latin, is makes
    > mroe sense to consider these to be the Latin characters Ezh / ezh, U+01B7 /
    > U+0292.)
    > So, the question is this: Should we say that this writing system is
    > completely Latin (keeping the norm that orthographic writing systems use a
    > single script) and apply the principle of unification -- across languages
    > but not across scripts -- to imply that we need to encode new characters,
    > Latin delta, Latin theta and Latin yeru? Or, do we say that this writing
    > system is only *mostly* Latin-based, and that it mixes in a few characters
    > from other scripts?
    > I have an idea what I think is the better thing to do, but I'm curious to
    > see if it matches others' opinions.
    > - Peter
    > ---------------------------------------------------------------------------
    > Peter Constable
    > Non-Roman Script Initiative, SIL International
    > 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    > Tel: +1 972 708 7485
    > E-mail: <>
    > (See attached file: Luqo Injil_38.gif)

    This archive was generated by hypermail 2.1.5 : Sat Nov 16 2002 - 14:50:39 EST