Lunate, Terminal, and Medial Sigma

From: Jim Allan (jallan@smrtytrek.com)
Date: Sat Nov 09 2002 - 17:49:00 EST

Next message: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"

Previous message: Anto'nio Martins-Tuva'lkin: "Re: ct, fj and blackletter ligatures"
Next in thread: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"
Maybe reply: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"
Reply: Doug Ewell: "Re: Lunate, Terminal, and Medial Sigma"
Reply: Carl W. Brown: "RE: Lunate, Terminal, and Medial Sigma"
Maybe reply: Jim Allan: "RE: Lunate, Terminal, and Medial Sigma"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Patrick Rourke posted:

/So either there should only be one sigma, with the presentation being
determined by position (unless the font defines both positions as
lunate), or there should only be the medial and terminal and no lunate
"symbol," with lunate being defined only by the font - but then most
people entering Greek text would just use the medial form for all
sigmas, regardless of the position. Maybe
text entry could correct this . . .

/This would only work in /most/ cases in modern Greek, and less often in
historical documents.

Yannis Haralamous in “From Unicode to Typography, a Case Study: the
Greek Script” at http://omega.enstb.org/yannis/pdf/boston99.pdf, writes:

    The letter sigma has a final form, written ς. Although this is a
    contextual property, there is a Unicode character for this letter:
    U+03C2; this is perfectly justified, because in some cases there is
    a semantical difference between the medial and final form of σ: for
    example, “φιλοσ.,” is necessarily the abreviation of some word (like
    φιλοσοφία) while “φιλος.” is a single non-abbreviated word, followed
    by a sentence period. In cases like this the form of the σ cannot be
    determined by a simple algorithm.
        There is a typographical curiosum, involving the final sigma:
    the /Grammar of Pontiac Language/ by K. Topkharas ([Top], reprinted
    in [Top₂]), published in 1928, in the Soviet Union, for the
    (Pontiac) Greek speaking minorities. This grammar completely
    abolishes accents, breathings, diphthongs, and uses only part of the
    alphabet. The ς is used for the sound ‘s’, and a double ςς for the
    English ‘sh’. Here is an excerpt of this book [Top, p. 49]:

        Σιν γλοςανεμυν επεμνεν ας αρχεον τιν γλοςαν κε το ακλιτον το
        λεκςοπον α πυ μεταχιριςκυςανατο ι παλιεμυν, ονταν εθελναν να
        φανερονε πος καπιον ιδιοτιταν πυ εςς εναν προςοπον για πραμαν,
        λιφταςςκετιατο καπιον αλο λ.χ. δινατος κε αδινατος.

From "SIGMA" by Katerina Sarri at
http://users.otenet.gr/~bm-celusy/sigma.html:

    By c.400 B.C.E. sigma took its final shape Σ at all greek
    city-states. The *final <ς>* was a later calligraphic version, when
    ending some words, and gradually, when ending all words. In old
    manuscripts it may be marked also within composed words (as the
    final letter of the first word) as in: ειςβάλλω = εισβάλλω <
    εις+βάλλω ( /I go in, attack/). Also, the *'lunate sigma'* (as looks
    the third letter of the latin alphabet) *C* was used instead of
    Σ,σ,ς (in the byzantine manuscripts, and today as a calligraphic
    variety, especially by the church).

One might indeed work with a smart-sigma text entry routine, like the
smart-quotes routines, but would also want to be able to turn off or
override it if necessary, as one can with smart-quotes routines without
relying on propietary switches in a particular font, not always
accessible through every program, and perhaps different algorithms used
by different fonts.

A /intelligent/ font in which the above quotations could not be properly
produced because it has its own ideas where variants ought to appear or
does not have them is less useful than a /stupid/ font which puts out
what the writer produced. Unicode with three versions of lower-case
sigma is more useful than Unicode with a single version. Encoding only
one lower-case sigma would not reduce the complexity, only push it up to
differing and incompatitable higher protocols.

When the character variants have distinct semantics or distribution that
cannot be predicted algorithmically and is not random, encoding these
variants at the Unicode plain text level is simple and robust and does
not prevent a higher protocal from identifying the characters for
particular purposes.

Patrick Rourke also posted:

/I just can't wait for all the search failures resulting from searching
for τις in a text
with τιϲ
/
Given the increased number of characters and variants allowed by
Unicode, complexity of intelligent searching also increases.

Search engines should allow variant insensitivity and diacritic
insensitivity as they now usually allow case insensitivity. Case
insensitivity is usually the default setting and so should be variant
insenstivity and perhaps diacritic insensitivity.

This should be better supported than it is.

Even Google distinguishes, I think foolishly, between /caesar/ and
/cæsar/ and between /fluss/ and /fluß/, to give two examples.

But even given that a search engine recognizes such variants, one still
has to deal with spelling differences, eg. /ecumenical/,
/oecumenical/,/œcumenical, eucumenical/.

From the specifications for the Pandora search engine at
http://etext.lib.virginia.edu/helpsheets/pandora.html:

Note that Pandora treats medial, final, and lunate sigma as the same
letter.

As Unicode becomes more widely used, search engines will adapt.

Jim Allan

Next message: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"
Previous message: Anto'nio Martins-Tuva'lkin: "Re: ct, fj and blackletter ligatures"
Next in thread: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"
Maybe reply: Nick Nicholas: "Re: Lunate, Terminal, and Medial Sigma"
Reply: Doug Ewell: "Re: Lunate, Terminal, and Medial Sigma"
Reply: Carl W. Brown: "RE: Lunate, Terminal, and Medial Sigma"
Maybe reply: Jim Allan: "RE: Lunate, Terminal, and Medial Sigma"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Nov 09 2002 - 18:38:05 EST