From: Kent Karlsson (kentk@md.chalmers.se)
Date: Mon Jun 02 2003 - 15:24:26 EDT
Ken,
Thanks for your thorough explanation! Finally something that
is at least partially convincing! (Of that it *sometimes* is a borrowing
of
*symbolism* from set theory notation, that is, nothing more.)
> Gustav Leunbach (1973), Morphological Analysis as a Step in
> Automated Syntactic Analysis of a
> Text. http://acl.ldc.upenn.edu/C/C73/C73-2022.pdf
> uses an empty set symbol to denote a morphological zero.
> (see p. 272). [Typographically, this could arguably
> have been taken from a type tray for a Norwegian ø
> character, rather than from a mathematical symbol font,
> but this is *clearly* not a slashed zero.]
That text uses a lowercase upright o-with-stroke. Other
similar fragments are fragments of actual spelling (AFAICT),
and are set is italics. In this case it is the uprightness that
marks this letter as a meta-notation, and not an object-letter.
(The text also uses some uppercase (and upright) letters X and Z
for meta-notation, variables if you like, in a table where other
("normalised" to uppercase) letters stand for themselves. That, of
course, does not make those X and Z into non-letters from a
Unicode point of view.)
> A. S. Liberman (1973), Towards a Phonological Algorithm.
>
> http://acl.ldc.upenn.edu/C/C73/C73-1015.pdf
>
> uses an empty set symbol to denote a phonological zero.
> (See pp. 196-197 for numerous examples.) These are
> clear examples, and show that this is used symbolically,
> to indicate a "something which is not there". Look at
> the type style. These are included in *italic* word
> citations, but the null set symbol (used to denote the
> phonological zero), is *not* set in italic.
That paper uses an upright uppercase o-with-stroke.
Again it's the uprightness that signals that this is
metanotation, while other parts of the example texts
are set in italics (to signal that it is literal text, or rather
object-text).
Note that the empty set can, and has been, typeset as an
italic U+00D8...
> Harri Jäppinen and Matti Ylilammi (1986), Associative Model
> of Morphological Analysis: An Empirical Inquiry.
>
> http://acl.ldc.upenn.edu/J/J86/J86-4001.pdf
>
> Displays a distinctive usages, with an italic epsilon to
> denote a morphological zero. (Not the same as the set theory
> use of epsilon to denote set membership.)
This is closer to the word theoretic convention of denoting the
empty string with an (italic) epsilon.
In the typeset portions of the text, all the Greek letters are
in italic. Whether this is a deliberate choice or just happenstance
is not clear. In the portions that appear to be facsimile from a
manuscript, they are upright. What is important for our discussion
here, it that the symbol used is a letter (L*), not a math symbol (Sm).
Just for an example of a linguistic paper that actually uses the
empty set and the empty set symbol to denote it:
http://www.linguistics.ucla.edu/people/stabler/elkeps-paris.pdf
(page 9). They also use hyphen followed by the empty set SYMBOL
(page 6; note the preceding hyphen) possibly in the meaning
"empty string" (there is no empty set there!). Indeed, the empty set
symbol here acts very much like a visible "filler" letter(!).
http://project.cgm.unive.it/events/papers/yablonski.pdf appears
to be using ~ (tilde) to denote the empty string.
http://www.eki.ee/teemad/morfoloogia/kuusik2.html also uses
epsilon to denote the empty string. So does
http://assets.cambridge.org/0521631963/sample/0521631963WS.pdf
(which is also very much oriented towards formal languages, and uses
that tradition's notations; it also uses an symbol similar to the empty
set
symbol as a regular expression atom; p. 11).
Different authors use different notation. No surprise, that is common,
especially before the notation has settled. What I think would be a
mistake though, is to a posteriori try to normalise all of the
(slightly)
different notations used to a common one. Then we get into
transcription
of meta-notations, and that should be done consciously, not by
character encoding and later font choice/replacement, which might
not be under the control of an author, or even (human) editor.
> > A slashed zero is completely
> > unrelated to the empty set symbol.
>
> This is nonsense.
Well, no... ;-)
> You have found the correct citations
> on the web regarding André Weil's claim to have introduced
> the empty set symbol, as part of the Bourbaki group. And
> for Weil, the source of the symbol may well be Norwegian ø.
> (What the Weil citation doesn't specify is why he chose
> a symbol vaguely reminiscent of a zero, while not actually
> being a zero, to represent the empty set.)
Well, if the motivation really had been "taking the glyph for zero
and putting a slash over it" that would have been an very easy
motivation that anyone could have used, as seen by home-cooked
derivations both from Michael and various type designers. But it
is NOT the one Weil uses. I you want me to speculate, I too can do
so (but I do label it as speculation): the ring symbolises a set (think
of Venn diagrams) and the stroke symbolises its emptiness.
...
> > The empty set symbol and slashed zero remain unrelated.
>
> Another bald assertion contradicted by Pullum (1996), who
> *does* relate them, in linguistic usage. Nobody is claiming
> that in *mathematical* usage they are connected, or would
> be acceptable alternative glyphs in a treatise on set theory.
Well, one reason for not being quiet is that while Pullman sort of
(but not quite) semantically equates them, Pullman also says (p. 137):
"Dennisen [...] uses the slightly *different* null [sic] set
symbol[...]"
(my emphasis), thus making a definite difference between the
"null sign" and the empty set symbol. Note also that this
is listed among other letters in Pullum's list (though there is
not formal classification of the symbols).
> What you are missing here is that the use of the empty set
> symbol in linguistics is associated with structuralist
> linguistics, which was in intellectual development roughly
> contemporaneously with the Bourbaki group. And structuralist
> morphology, in particular, was influenced by formal set
> theory, and many morphologists borrowed the kind of formalisms
> used by logicians and set theoreticians.
Good. Thanks. This does provide a link between the two!
> A phonological zero or a morphological zero has nothing to
> do with numeric values, nor is it conceived of as part of
> a word, per se. It is a pattern gap, an absence, a set with
> no elements.
There is still no actual set here (but see PS below)... A "gap", ok.
But there are characters in Unicode, with general category Lo, to
denote (other) gaps; and some don't even have any glyph (think of
the Hangul fillers). And you mentioned yourself (above) the use of
the *letter* epsilon to denote a morphological "zero".
> Your mistake is to assume that this derives from some kind
> of transcriptional usage. It does not. It comes from
> pattern analysis of structural systems, by structuralist
> linguists influenced by mathematical formalism and set theory,
> among other things.
Ok. That does, however, not mean that an "empty" structure cannot
be denoted by a letter (like Ø or e).
...
> > Then I promise to be very quiet (and nod ok)! ;-)
>
> Please read Liberman, and then be very quiet and nod ok.
Hmm. No claim of set theoretic derivation there...
I did browse a few papers on linguistics that uses sets,
and set theoretic notation (some referenced above).
None that I found claim that the "slashed ring-like" symbol
in linguistic patterns is an empty set symbol, or that it
derives from set theory. All of them appear to apply set theory
correctly though (in contrast to Jarkko's jump to erroneous
conclusions; see also the ps below).
...
> But the phonological/morphological zero is *NOT* a letter
> of transcription. It is a symbol which appears in phonological
> and morphological analysis.
So would you then consider the use of e (epsilon) erroneous
for this usage (see above)?
> Morphologists also embed other
> symbols in such analyses, including juncture symbols such
> as "-", "+", "#", "=", and so on. But such practice does
> not make those symbols letters, either.
But do they represent anything that *was* a sound/morpheme/word?
/kent k
PS (getting a bit off-topic)
If you really want to see expressions like -st, -ing, and -Ø (or -e, -Ø,
or -~), as expressions where juxtapositioning denotes concatenation,
"ordinary" letters (not meta-letters) standing for a singleton set
of that letter, and hyphen indicates "fragment for concatenation",
then what is this "empty" thingie? Can it really be the empty set?
Well, no, because any concatenation with the empty set results
in the empty set, which is obviously not what the authors intended.
Note that concatenation of sets of strings (strings are in formal
language theory called "words") is defined as the set of strings
resulting from concatenating all strings in the first set with all
strings in the second set. If either set is empty, the concatenation
is thus empty! So the "empty" here, if you want to see it this way,
must be the singleton set of the empty string (to get the reasonably
intended result). The set of the empty string is an identity element
for the concatenation of sets of strings operation. (So, if you like,
concatenation is like multiplication, the empty set is like 0, and
the "linguistic null" is like 1!)
This archive was generated by hypermail 2.1.5 : Mon Jun 02 2003 - 16:17:13 EDT