Lunate, Terminal, and Medial Sigma

From: Jim Allan (
Date: Sat Nov 09 2002 - 17:49:00 EST

  • Next message: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"

    Patrick Rourke posted:

    /So either there should only be one sigma, with the presentation being
    determined by position (unless the font defines both positions as
    lunate), or there should only be the medial and terminal and no lunate
    "symbol," with lunate being defined only by the font - but then most
    people entering Greek text would just use the medial form for all
    sigmas, regardless of the position. Maybe
    text entry could correct this . . .

    /This would only work in /most/ cases in modern Greek, and less often in
    historical documents.

    Yannis Haralamous in “From Unicode to Typography, a Case Study: the
    Greek Script” at, writes:

        The letter sigma has a final form, written ς. Although this is a
        contextual property, there is a Unicode character for this letter:
        U+03C2; this is perfectly justified, because in some cases there is
        a semantical difference between the medial and final form of σ: for
        example, “φιλοσ.,” is necessarily the abreviation of some word (like
        φιλοσοφία) while “φιλος.” is a single non-abbreviated word, followed
        by a sentence period. In cases like this the form of the σ cannot be
        determined by a simple algorithm.
            There is a typographical curiosum, involving the final sigma:
        the /Grammar of Pontiac Language/ by K. Topkharas ([Top], reprinted
        in [Top₂]), published in 1928, in the Soviet Union, for the
        (Pontiac) Greek speaking minorities. This grammar completely
        abolishes accents, breathings, diphthongs, and uses only part of the
        alphabet. The ς is used for the sound ‘s’, and a double ςς for the
        English ‘sh’. Here is an excerpt of this book [Top, p. 49]:

            Σιν γλοςανεμυν επεμνεν ας αρχεον τιν γλοςαν κε το ακλιτον το
            λεκςοπον α πυ μεταχιριςκυςανατο ι παλιεμυν, ονταν εθελναν να
            φανερονε πος καπιον ιδιοτιταν πυ εςς εναν προςοπον για πραμαν,
            λιφταςςκετιατο καπιον αλο λ.χ. δινατος κε αδινατος.

     From "SIGMA" by Katerina Sarri at

        By c.400 B.C.E. sigma took its final shape Σ at all greek
        city-states. The *final <ς>* was a later calligraphic version, when
        ending some words, and gradually, when ending all words. In old
        manuscripts it may be marked also within composed words (as the
        final letter of the first word) as in: ειςβάλλω = εισβάλλω <
        εις+βάλλω ( /I go in, attack/). Also, the *'lunate sigma'* (as looks
        the third letter of the latin alphabet) *C* was used instead of
        Σ,σ,ς (in the byzantine manuscripts, and today as a calligraphic
        variety, especially by the church).

    One might indeed work with a smart-sigma text entry routine, like the
    smart-quotes routines, but would also want to be able to turn off or
    override it if necessary, as one can with smart-quotes routines without
    relying on propietary switches in a particular font, not always
    accessible through every program, and perhaps different algorithms used
    by different fonts.

    A /intelligent/ font in which the above quotations could not be properly
    produced because it has its own ideas where variants ought to appear or
    does not have them is less useful than a /stupid/ font which puts out
    what the writer produced. Unicode with three versions of lower-case
    sigma is more useful than Unicode with a single version. Encoding only
    one lower-case sigma would not reduce the complexity, only push it up to
    differing and incompatitable higher protocols.

    When the character variants have distinct semantics or distribution that
    cannot be predicted algorithmically and is not random, encoding these
    variants at the Unicode plain text level is simple and robust and does
    not prevent a higher protocal from identifying the characters for
    particular purposes.

    Patrick Rourke also posted:

    /I just can't wait for all the search failures resulting from searching
    for τις in a text
    with τιϲ
    Given the increased number of characters and variants allowed by
    Unicode, complexity of intelligent searching also increases.

    Search engines should allow variant insensitivity and diacritic
    insensitivity as they now usually allow case insensitivity. Case
    insensitivity is usually the default setting and so should be variant
    insenstivity and perhaps diacritic insensitivity.

    This should be better supported than it is.

    Even Google distinguishes, I think foolishly, between /caesar/ and
    /cæsar/ and between /fluss/ and /fluß/, to give two examples.

    But even given that a search engine recognizes such variants, one still
    has to deal with spelling differences, eg. /ecumenical/,
    /oecumenical/,/œcumenical, eucumenical/.

     From the specifications for the Pandora search engine at

        Note that Pandora treats medial, final, and lunate sigma as the same

    As Unicode becomes more widely used, search engines will adapt.

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Sat Nov 09 2002 - 18:38:05 EST