Re: Greek characters in IPA usage

From: Julian Bradfield (jcb+unicode@inf.ed.ac.uk)
Date: Sat Aug 15 2009 - 10:24:50 CDT

  • Next message: Asmus Freytag: "Re: Ligatures and Decompositions"

    Let me play partly devil's advocate to both Michael and Asmus!

    On 2009-08-15, Michael Everson <everson@evertype.com> wrote:

    > well, modern. But even in the 1949 Principles of the International
    > Phonetic Association it was quite clear that the borrowings from other
    > alphabets into the IPA were intended to be *naturalizations*, not just
    > temporary visiting.

    Agreed. However, the IPA has been backsliding on this. The letter chi
    is, unsurprisingly, a problem. If you look at the current IPA chart,
    you will see chi printed with a plain (upright) Greek chi.

    > "The non-roman letters of the International Phonetic Alphabet have
    > been designed as far as possible to harmonise well with the roman
    > letters. The Association does not recognise makeshift letters; it
    > recognises only letters which have been carefully cut so as to be in
    > harmony with the other letters.[...]"

    Chi never did fit. In the "extended x" version used in previous IPA
    publications, it was romanized by having straight lines with serifs,
    but they retained the stroke weight of the Greek chi: light on the
    rightwards stroke, heavy on the leftwards stroke, which of course does
    not harmonize with roman letters derived from the broad-nib pen
    tradition. (Somewhere I still have the (actual) carbon of a letter I
    wrote to the IPA in around 1980 suggesting they reverse the stroke
    weights! I don't think they took any notice.)
    The same problem arises with all the "reversed" letters.

    > As of Unicode 5.1, we have Latin delta ẟ at 1E9F.

    I didn't know that! That adds another precedent.

    > We don't really have that luxury in the "real" world of text, where
    > plain language text (even in Greek!) and IPA transcriptions co-exist.
    > I'm typesetting a book now in Baskerville, and I'm using Baskerville
    > IPA and Baskerville Greek in it. I'm glad I don't have to use IPA beta
    > because the Baskerville Greek beta is not correct. It's vertical. But

    If you're typesetting, you're using fonts, not plain text.

    > it's been designed for the other Greek letters, not for Latin. The
    > Greek theta could pass for the Latin, but the weight of the Greek chi
    > is exactly the reverse of the expected weight for the Latin chi: in
    > Latin the thick leg should be the northeast-southwest leg, but it is
    > the reverse for the Greek.

    That's because Baskerville Greek has *really* been "cut" to harmonize
    with the roman, whereas the old IPA chi wasn't. Nobody could "expect"
    a Latin letter to have a heavy leftwards stroke unless they had been
    previously corrupted by the IPA.

    > Indeed, the IPA chi is different from the long x used in Germanicist
    > dialectology; I should not name the character in http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3555.pdf
    > chi -- it is a stretched x, because its northwest-southeast leg is
    > thick, the opposite of what the IPA Handbook and Abercrombie specify.

    Whereas I would say the old IPA chi is a typographic bastard which
    should be quietly drowned (as indeed the IPA has done)! It is,
    however, vanishingly unlikely that anybody would ever wish to use the
    two in contrast; I am utterly sure that the IPA would not sanction the
    introduction of "stretched x" and "reversed stretched x" (which is
    what the old IPA chi really is) as distinct symbols, and nor would
    anybody else with any sense. The difference disappears in a
    uniform stroke-width or a horizontally stressed font.

    > And, of course, I can't sort IPA material containing beta, theta, or
    > chi correctly.

    Yes you can, just as well as you can now. You just need the software.
    There's no canonical sort order defined on the IPA, but one of the
    more common sort orders is IPA chart order or similar (have a look at
    that !Xoo dictionary sitting on your shelf). So "p" < "b". You have to
    cope with that. Alternatively, if you use a pseudo-alphabetical order,
    you have to intersperse the IPA block with Latin anyway, so it's no
    harder to intersperse Greek as well.
    You don't (I suppose) complain that you can't sort both Finnish and
    German because they sort ä and ö differently - the same applies to IPA
    versus English (or whatever).

    > VS1? No. That's pseudo-encoding. It's not going to guarantee better
    > font support -- while character disunification surely will, in time.
    > The problem is the false unification. That has some impact on legacy
    > data, but there are still probably more non-UCS IPA fonts in use than
    > there are Unicode-based IPA fonts. In the long run we will be better
    > off with the disunification.

    Agreed, on reflection.

    > I want my Greek fonts to be Greek, not compromises with Latin. And I
    > want my IPA fonts to be Latin, not compromises with Greek. Not for the
    > sake of three letters. It makes no sense. Added to the fact that I may
    > need to sort multilingual multiscript data -- and we end up with
    > EXACTLY the same argument we had for Kurdish KU and WE.

    Don't confuse fonts with characters!
    It's like CJK: you can't typeset well Chinese and Japanese with one font,
    though you can (so it is said) represent the *plain text* readably.

    > Asmus said:
    >> It's not been a design point for the standard to support "single
    >> font" display of IPA intermixed with regular text. The idea was that
    >> the IPA would use a font suitable for IPA (i.e. using all the
    >> shapes as best for IPA) while all the text portions would use a font
    >> that had the other shapes (and therefore is unsuitable for IPA).

    Which, on the other hand, seems to me to be inconsistent with the
    plain text ideal: if you have to use fonts, it's not plain text.

       [Asmus again]
    >> Adding a new character code for the shape is a non-starter. It would
    >> make all existing IPA either invalid or ambiguous. Not something you
    >> can do 20 years after the first draft of the standard that contained
    >> both Greek and IPA.
    >
    >
    > Nonsense. You can certainly do so in a standard you expect to be used
    > for 80 or 100 years or more. This is a disunification that should have
    > already happened. LATIN SMALL LETTER DELTA got encoded (for phonetic
    > purposes, and it was used widely since Lepsius before ETH was used in
    > IPA) for Unicode 5.1. It wasn't too late for that. The Variation
    > Selector doesn't solve the problem of sorting, either. Certainly not
    > in a way that any ordinary user could avail of. Give us the three
    > characters, and we who make the fonts for the users won't have any
    > difficulties at all, and the UCA can sort them within Latin instead of
    > within Greek.

    I agree with Michael. There is a not a huge amount of phonetic
    material encoded (correctly) in UCS rather than legacy, and that
    material that is encoded doesn't need to be searched (or rather it
    does, but it can't be, because it's actual phonetic transcription, and
    constructing a search string that works is harder than scanning the
    entire text by eye!). More on this further down.

    > Peter said:
    >
    >> I’d venture a guess that most linguists aren’t too concerned
    >> about the exact shape of the beta and theta for daily work, but are
    >> concerned only when publishing. (And, in many cases, it won’t be
    >> the linguist themself but rather the journal editor who cares.)
    >
    > You're pretty far removed from the game, I think. I'm involved in
    > grammar and dictionary production now, and good typography and
    > harmonized fonts is a concern.

    You're a typography freak and a book producer. I hang out with actual
    linguists, and read their manuscript drafts, and they don't give a
    toss, for the most part - they just do whatever's easiest to type on
    their ancient copy of Word. Those who do care are geeks like us, and
    they care qua geek, not qua linguist. (Lexicographers also care - but
    they're geeks anyway! It's part of the job.)
    Also, I'd be surprised if there are any journals left that do the level
    of copy-editing required to care about such things. Most journals now
    require the author's source files, and do the minimal changes to match
    the house style at a gross level. (JIPA has wanted the author's source
    (in, ack splth, Word) for decades.)
    Even formerly reputable university presses such as (O/C)UP are giving up
    on typographic quality control: half the (O/C)UP books I buy give me cause
    for complaint. (Again, because for the small academic market they no
    longer copy-edit, but take the author's files.)

    > Julian said:
    >> However, I do have some qualms about this: why do I not also need a
    >> separate ipa "a" - I might be using a font in which the normal "a" is
    >> actually É‘-shaped! Indeed, really I would like separate codepoints
    >> for
    >> all IPA letters - but we know that would fail dismally in practice,
    >> even had it been implemented from the start.
    >
    > In such a situation, you draw the a like É‘ (script a). and you draw
    > the ɑ (script a) like α (alpha).

    No, you're wrong. That's not what I do. This exact situation arises
    in hand-writing IPA. What one does in hand-writing IPA is to use the
    normal cursive letter "a" (which is the glyph "É‘") for the IPA letter
    "É‘", and use a cursive imitation of the glyph "a" for letter
    "a". Nowadays, the recommendation (see your most recent IPA Handbook)
    is to simply to imitate the printed form; in the old days, there was a
    cursive IPA with forms designed to join up. (It was a pain to learn (I
    tried), and most people now have never heard of it - far easier to
    hand-"print" the letters.)

    >> Given the situation as it is, I support the idea of variation
    >> selectors.
    >
    > I don't. I support disunification.

    Well, really so do I. I just had the impression it was going to be too
    hard to get through!

    Now for Asmus' post:

    On 2009-08-15, Asmus Freytag <asmusf@ix.netcom.com> wrote:

    > This situation is entirely parallel to the IPA use of the Latin letter
    > "a". The form with single bowl has been encoded as IPA specific, but the
    > form with handle has not. There's only ambiguous 0061. As a result, any
    > font that uses a single bowl a at location 0041 will be "unsuitable" for
    > IPA. The situation for the Greek letters and IPA is similar, but not
    > identical, because "Latinized" forms don't necessarily fall into the
    > natural range of glyph variations for Greek letters (or you can at least
    > argue that). But otherwise these cases are not so different.

    Indeed, but they are different.

    > Whenever you aspire to full plain text support for IPA (so that your
    > entire document can be in a single font), you will be limited by the
    > case of the 'a' as well as that of the Greek letters. Both will limit
    > the fonts that you can use for single-font mixed text/IPA documents.

    The difference between a/É‘ and the Greek letters is that the current
    situation with the Greek limits the set of fonts to the empty set -
    *if* you consider that "plain text support" requires the ability to
    produce typographically good renditions, rather than just readable
    renditions. But this is a font question rather than a plain text
    question. Greek beta and IPA beta can perfectly legitimately be
    glyphically identical if the document style is sans-serif; but a and É‘
    must never be identical.

    In general, "must contrast" constraints can only arise within a single
    writing system, or via the round-trip requirement for legacy
    standards. (Though I have to wonder: even if the legacy standards
    didn't exist, would you have unified Latin, Greek and Cyrillic o?)
    "May contrast", on the other hand, almost always arises from the
    desire to encode the different language, writing system, or
    typographic tradition, which is arguably not part of the plain text.

    The IPA has a "must contrast" requirement on a/É‘, but this could have
    been satisfied by mapping ɑ to Greek α, which already contrasts.
    Similarly for all the other Greek letters in the IPA.

    So I think it comes down to consistency. It is just inexplicable that
    ɑ, ɣ, ɛ, ɩ, ʊ etc. have been distinguished, but not β, χ, etc.

    > That's the problem statement. Next come the boundary conditions.
    >
    > If this discussion had taken place in 1988, or 1989, different boundary
    > conditions would have applied, because at that time, there were neither
    > existing data nor existing software using Unicode. Since then, this
    > situation has changed, and provides an important boundary condition on
    > the discussion.

    However, there are other boundary conditions, and this one is not all
    it seems.

    > An important fact to be considered is that all Unicode encoded text for
    > 'a' with a handle or IPA Greek (or math loopy phi) has had to be encoded
    > using the ordinary Latin resp. Greek characters. That has been going on
    > for nearly 20 years now.

    However, for that same twenty years, people have been randomly using
    the Greek letters when they mean the IPA letters that *are*
    separated. They still are, even in their modern Unicode-friendly
    Windows systems.
    Google "voiced velar fricative": once you get past the first couple of
    pages of results of articles *about* it, and into the articles that
    actually *use* it, you'll see plenty of Greek gammas.

    > If you suddenly switch to different
    > *characters* you will get massive trouble in searching and sorting IPA
    > text, because old and new text denoting the *same* pronunciation will
    > suddenly have differently encoded strings.

    Nothing new. You already have to search for [ɣγ] to look for ɣ, and
    so on.

    > Since they will look 100%
    > alike for some fonts (definitely true for the case of 'a' here), few
    > authors will even know which character they were using. Security minded

    That's the case now. That's a headache for the linguistics data
    curators, and it will never go away, unless you unify all characters
    that do not have a "must contrast" constraint.

    > folks will go nuts at having even more perfect or near-perfect clones of
    > ordinary letters added to the standard.

    Tough. You gave them Latin, Greek, and Cyrillic "o". What do a few
    more matter?

    > 1.) You can provide new character codes for all notational use of
    ...
    > before (and will continue to be used). Documents using the new
    > characters will depend on fonts supporting those characters. Until then,
    > they can only be exchanged in the context of font-embedding technologies
    > (e.g. PDF).

    Yes. However, if characters are encoded, they will be in the majority
    of commonly used wide-coverage fonts within a small number of years -
    probably less than a year for the open source fonts.

    > 2.) You can provide a variation selector approach, where pairing a given
    > variation selector with an *ambiguous* character will identify the
    > preferred glyph shape. Well-written existing software would ignore the
    > VS, and give you fallback behavior. All new documents would display at
    > least as well as before, even in the absence of new fonts. Sort and
    > search applications, if written to the existing specifications of
    > Unicode, which require that a VS be ignored, would sort and search new
    > and old IPA data alike. All you need to do to get the new glyphs is to
    > have fonts supporting the Variation Sequences with new glyphs. You may
    > need to work with display engine suppliers to enable such font features
    > (but since such features are used for other scripts/contexts, this may
    > not be as hard as it looks).

    This solution runs into the boundary condition that you didn't state,
    and that (IMHO) Unicode zealots (of whom I count myself one most of
    the time) are far too prone to ignore, because it's embarrassing.
    BACKWARDS COMPATIBILITY!
    Not of *data*, but of *systems*.

    Very few systems have full Unicode support. I know nothing on any
    system I use that understands variation selectors even well enough to
    ignore them correctly. Sure it will come, but it will take (in my
    guess) more like ten years than a year. Even then, there are (big)
    circumstances in which it will never come: plain text processing under
    older X Window systems. The core font system for the X Window System,
    which is now the GUI infrastructure for every Unix system on the
    planet, has no difficulty with basic Unicode support (ie. ignoring
    rendering niceties). It didn't envisage the supplementary planes, but
    there is a very obvious and very trivial workaround which will surely
    be blessed by X.org if they ever come to their senses and abandon
    their willful neglect of the core font system. (Namely, use the
    "registry" iso10646-n for plane (n-1).)

    However, there is no way the core font system can ever handle
    variation selectors - it just doesn't have the mechanism. It would
    have to be hacked in at the application library level using PUA
    character positions in the font; which would have to be agreed outwith
    Unicode, implemented in libraries, etc. etc.
    For the non-VS solution, I can have basic *plain-text* support on ten-year
    old systems just by installing updated fonts. For the VS solution,
    I'll be lucky if I get support on systems several years in the future,
    never mind from ten years ago.

    I'm sure the same applies to Windows: what chance has someone running
    their nice (relatively) reliable Windows XP system of getting support for
    VS-encoded IPA Greek in a few years' time?

    (And as a personal issue, my editor of choice certainly doesn't know
    about VS (it doesn't even know about unicode properly), and nobody
    other than me is going to make it do so;-)

    Almost finally, if you go the VS route, surely this form of "pseudo-encoding"
    (as Michael put it) needs to have variants for *both* the unambiguous
    forms, not just one of them, as seems to be the current
    practice. Otherwise you have to scan the whole document for an
    occurrence of <χ><VS1> in order to determine whether plain <χ> can be
    treated as ambiguous or not.

    Really finally, shouldn't the *users* have some input? I see three
    classes of people who should be consulted:
    * the working linguist. I don't know how you do this (other than by
      posting to LINGUIST). I work with the UK's leading linguistics
      department, and I could hijack a meeting slot to find out what they
      think. That would be anecdotal, but a bit less self-selecting than a
      LINGUIST discussion. The full-time linguists on this list could do
      so also.
    * The IPA. (Of course, they can't be trusted to get things right,
      given their record on chi;-)
    * People who publish books and databases using the IPA - which
      probably means the major academic publishers. Is none of them in the
      Unicode Consortium?

    -- 
    The University of Edinburgh is a charitable body, registered in
    Scotland, with registration number SC005336.
    


    This archive was generated by hypermail 2.1.5 : Sat Aug 15 2009 - 10:29:28 CDT