From: Julian Bradfield (jcb+unicode@inf.ed.ac.uk)
Date: Sat Aug 15 2009 - 10:24:50 CDT
Let me play partly devil's advocate to both Michael and Asmus!
On 2009-08-15, Michael Everson <everson@evertype.com> wrote:
> well, modern. But even in the 1949 Principles of the International
> Phonetic Association it was quite clear that the borrowings from other
> alphabets into the IPA were intended to be *naturalizations*, not just
> temporary visiting.
Agreed. However, the IPA has been backsliding on this. The letter chi
is, unsurprisingly, a problem. If you look at the current IPA chart,
you will see chi printed with a plain (upright) Greek chi.
> "The non-roman letters of the International Phonetic Alphabet have
> been designed as far as possible to harmonise well with the roman
> letters. The Association does not recognise makeshift letters; it
> recognises only letters which have been carefully cut so as to be in
> harmony with the other letters.[...]"
Chi never did fit. In the "extended x" version used in previous IPA
publications, it was romanized by having straight lines with serifs,
but they retained the stroke weight of the Greek chi: light on the
rightwards stroke, heavy on the leftwards stroke, which of course does
not harmonize with roman letters derived from the broad-nib pen
tradition. (Somewhere I still have the (actual) carbon of a letter I
wrote to the IPA in around 1980 suggesting they reverse the stroke
weights! I don't think they took any notice.)
The same problem arises with all the "reversed" letters.
> As of Unicode 5.1, we have Latin delta ẟ at 1E9F.
I didn't know that! That adds another precedent.
> We don't really have that luxury in the "real" world of text, where
> plain language text (even in Greek!) and IPA transcriptions co-exist.
> I'm typesetting a book now in Baskerville, and I'm using Baskerville
> IPA and Baskerville Greek in it. I'm glad I don't have to use IPA beta
> because the Baskerville Greek beta is not correct. It's vertical. But
If you're typesetting, you're using fonts, not plain text.
> it's been designed for the other Greek letters, not for Latin. The
> Greek theta could pass for the Latin, but the weight of the Greek chi
> is exactly the reverse of the expected weight for the Latin chi: in
> Latin the thick leg should be the northeast-southwest leg, but it is
> the reverse for the Greek.
That's because Baskerville Greek has *really* been "cut" to harmonize
with the roman, whereas the old IPA chi wasn't. Nobody could "expect"
a Latin letter to have a heavy leftwards stroke unless they had been
previously corrupted by the IPA.
> Indeed, the IPA chi is different from the long x used in Germanicist
> dialectology; I should not name the character in http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3555.pdf
> chi -- it is a stretched x, because its northwest-southeast leg is
> thick, the opposite of what the IPA Handbook and Abercrombie specify.
Whereas I would say the old IPA chi is a typographic bastard which
should be quietly drowned (as indeed the IPA has done)! It is,
however, vanishingly unlikely that anybody would ever wish to use the
two in contrast; I am utterly sure that the IPA would not sanction the
introduction of "stretched x" and "reversed stretched x" (which is
what the old IPA chi really is) as distinct symbols, and nor would
anybody else with any sense. The difference disappears in a
uniform stroke-width or a horizontally stressed font.
> And, of course, I can't sort IPA material containing beta, theta, or
> chi correctly.
Yes you can, just as well as you can now. You just need the software.
There's no canonical sort order defined on the IPA, but one of the
more common sort orders is IPA chart order or similar (have a look at
that !Xoo dictionary sitting on your shelf). So "p" < "b". You have to
cope with that. Alternatively, if you use a pseudo-alphabetical order,
you have to intersperse the IPA block with Latin anyway, so it's no
harder to intersperse Greek as well.
You don't (I suppose) complain that you can't sort both Finnish and
German because they sort ä and ö differently - the same applies to IPA
versus English (or whatever).
> VS1? No. That's pseudo-encoding. It's not going to guarantee better
> font support -- while character disunification surely will, in time.
> The problem is the false unification. That has some impact on legacy
> data, but there are still probably more non-UCS IPA fonts in use than
> there are Unicode-based IPA fonts. In the long run we will be better
> off with the disunification.
Agreed, on reflection.
> I want my Greek fonts to be Greek, not compromises with Latin. And I
> want my IPA fonts to be Latin, not compromises with Greek. Not for the
> sake of three letters. It makes no sense. Added to the fact that I may
> need to sort multilingual multiscript data -- and we end up with
> EXACTLY the same argument we had for Kurdish KU and WE.
Don't confuse fonts with characters!
It's like CJK: you can't typeset well Chinese and Japanese with one font,
though you can (so it is said) represent the *plain text* readably.
> Asmus said:
>> It's not been a design point for the standard to support "single
>> font" display of IPA intermixed with regular text. The idea was that
>> the IPA would use a font suitable for IPA (i.e. using all the
>> shapes as best for IPA) while all the text portions would use a font
>> that had the other shapes (and therefore is unsuitable for IPA).
Which, on the other hand, seems to me to be inconsistent with the
plain text ideal: if you have to use fonts, it's not plain text.
[Asmus again]
>> Adding a new character code for the shape is a non-starter. It would
>> make all existing IPA either invalid or ambiguous. Not something you
>> can do 20 years after the first draft of the standard that contained
>> both Greek and IPA.
>
>
> Nonsense. You can certainly do so in a standard you expect to be used
> for 80 or 100 years or more. This is a disunification that should have
> already happened. LATIN SMALL LETTER DELTA got encoded (for phonetic
> purposes, and it was used widely since Lepsius before ETH was used in
> IPA) for Unicode 5.1. It wasn't too late for that. The Variation
> Selector doesn't solve the problem of sorting, either. Certainly not
> in a way that any ordinary user could avail of. Give us the three
> characters, and we who make the fonts for the users won't have any
> difficulties at all, and the UCA can sort them within Latin instead of
> within Greek.
I agree with Michael. There is a not a huge amount of phonetic
material encoded (correctly) in UCS rather than legacy, and that
material that is encoded doesn't need to be searched (or rather it
does, but it can't be, because it's actual phonetic transcription, and
constructing a search string that works is harder than scanning the
entire text by eye!). More on this further down.
> Peter said:
>
>> I’d venture a guess that most linguists aren’t too concerned
>> about the exact shape of the beta and theta for daily work, but are
>> concerned only when publishing. (And, in many cases, it won’t be
>> the linguist themself but rather the journal editor who cares.)
>
> You're pretty far removed from the game, I think. I'm involved in
> grammar and dictionary production now, and good typography and
> harmonized fonts is a concern.
You're a typography freak and a book producer. I hang out with actual
linguists, and read their manuscript drafts, and they don't give a
toss, for the most part - they just do whatever's easiest to type on
their ancient copy of Word. Those who do care are geeks like us, and
they care qua geek, not qua linguist. (Lexicographers also care - but
they're geeks anyway! It's part of the job.)
Also, I'd be surprised if there are any journals left that do the level
of copy-editing required to care about such things. Most journals now
require the author's source files, and do the minimal changes to match
the house style at a gross level. (JIPA has wanted the author's source
(in, ack splth, Word) for decades.)
Even formerly reputable university presses such as (O/C)UP are giving up
on typographic quality control: half the (O/C)UP books I buy give me cause
for complaint. (Again, because for the small academic market they no
longer copy-edit, but take the author's files.)
> Julian said:
>> However, I do have some qualms about this: why do I not also need a
>> separate ipa "a" - I might be using a font in which the normal "a" is
>> actually É‘-shaped! Indeed, really I would like separate codepoints
>> for
>> all IPA letters - but we know that would fail dismally in practice,
>> even had it been implemented from the start.
>
> In such a situation, you draw the a like É‘ (script a). and you draw
> the ɑ (script a) like α (alpha).
No, you're wrong. That's not what I do. This exact situation arises
in hand-writing IPA. What one does in hand-writing IPA is to use the
normal cursive letter "a" (which is the glyph "É‘") for the IPA letter
"É‘", and use a cursive imitation of the glyph "a" for letter
"a". Nowadays, the recommendation (see your most recent IPA Handbook)
is to simply to imitate the printed form; in the old days, there was a
cursive IPA with forms designed to join up. (It was a pain to learn (I
tried), and most people now have never heard of it - far easier to
hand-"print" the letters.)
>> Given the situation as it is, I support the idea of variation
>> selectors.
>
> I don't. I support disunification.
Well, really so do I. I just had the impression it was going to be too
hard to get through!
Now for Asmus' post:
On 2009-08-15, Asmus Freytag <asmusf@ix.netcom.com> wrote:
> This situation is entirely parallel to the IPA use of the Latin letter
> "a". The form with single bowl has been encoded as IPA specific, but the
> form with handle has not. There's only ambiguous 0061. As a result, any
> font that uses a single bowl a at location 0041 will be "unsuitable" for
> IPA. The situation for the Greek letters and IPA is similar, but not
> identical, because "Latinized" forms don't necessarily fall into the
> natural range of glyph variations for Greek letters (or you can at least
> argue that). But otherwise these cases are not so different.
Indeed, but they are different.
> Whenever you aspire to full plain text support for IPA (so that your
> entire document can be in a single font), you will be limited by the
> case of the 'a' as well as that of the Greek letters. Both will limit
> the fonts that you can use for single-font mixed text/IPA documents.
The difference between a/É‘ and the Greek letters is that the current
situation with the Greek limits the set of fonts to the empty set -
*if* you consider that "plain text support" requires the ability to
produce typographically good renditions, rather than just readable
renditions. But this is a font question rather than a plain text
question. Greek beta and IPA beta can perfectly legitimately be
glyphically identical if the document style is sans-serif; but a and É‘
must never be identical.
In general, "must contrast" constraints can only arise within a single
writing system, or via the round-trip requirement for legacy
standards. (Though I have to wonder: even if the legacy standards
didn't exist, would you have unified Latin, Greek and Cyrillic o?)
"May contrast", on the other hand, almost always arises from the
desire to encode the different language, writing system, or
typographic tradition, which is arguably not part of the plain text.
The IPA has a "must contrast" requirement on a/É‘, but this could have
been satisfied by mapping ɑ to Greek α, which already contrasts.
Similarly for all the other Greek letters in the IPA.
So I think it comes down to consistency. It is just inexplicable that
ɑ, ɣ, ɛ, ɩ, ʊ etc. have been distinguished, but not β, χ, etc.
> That's the problem statement. Next come the boundary conditions.
>
> If this discussion had taken place in 1988, or 1989, different boundary
> conditions would have applied, because at that time, there were neither
> existing data nor existing software using Unicode. Since then, this
> situation has changed, and provides an important boundary condition on
> the discussion.
However, there are other boundary conditions, and this one is not all
it seems.
> An important fact to be considered is that all Unicode encoded text for
> 'a' with a handle or IPA Greek (or math loopy phi) has had to be encoded
> using the ordinary Latin resp. Greek characters. That has been going on
> for nearly 20 years now.
However, for that same twenty years, people have been randomly using
the Greek letters when they mean the IPA letters that *are*
separated. They still are, even in their modern Unicode-friendly
Windows systems.
Google "voiced velar fricative": once you get past the first couple of
pages of results of articles *about* it, and into the articles that
actually *use* it, you'll see plenty of Greek gammas.
> If you suddenly switch to different
> *characters* you will get massive trouble in searching and sorting IPA
> text, because old and new text denoting the *same* pronunciation will
> suddenly have differently encoded strings.
Nothing new. You already have to search for [ɣγ] to look for ɣ, and
so on.
> Since they will look 100%
> alike for some fonts (definitely true for the case of 'a' here), few
> authors will even know which character they were using. Security minded
That's the case now. That's a headache for the linguistics data
curators, and it will never go away, unless you unify all characters
that do not have a "must contrast" constraint.
> folks will go nuts at having even more perfect or near-perfect clones of
> ordinary letters added to the standard.
Tough. You gave them Latin, Greek, and Cyrillic "o". What do a few
more matter?
> 1.) You can provide new character codes for all notational use of
...
> before (and will continue to be used). Documents using the new
> characters will depend on fonts supporting those characters. Until then,
> they can only be exchanged in the context of font-embedding technologies
> (e.g. PDF).
Yes. However, if characters are encoded, they will be in the majority
of commonly used wide-coverage fonts within a small number of years -
probably less than a year for the open source fonts.
> 2.) You can provide a variation selector approach, where pairing a given
> variation selector with an *ambiguous* character will identify the
> preferred glyph shape. Well-written existing software would ignore the
> VS, and give you fallback behavior. All new documents would display at
> least as well as before, even in the absence of new fonts. Sort and
> search applications, if written to the existing specifications of
> Unicode, which require that a VS be ignored, would sort and search new
> and old IPA data alike. All you need to do to get the new glyphs is to
> have fonts supporting the Variation Sequences with new glyphs. You may
> need to work with display engine suppliers to enable such font features
> (but since such features are used for other scripts/contexts, this may
> not be as hard as it looks).
This solution runs into the boundary condition that you didn't state,
and that (IMHO) Unicode zealots (of whom I count myself one most of
the time) are far too prone to ignore, because it's embarrassing.
BACKWARDS COMPATIBILITY!
Not of *data*, but of *systems*.
Very few systems have full Unicode support. I know nothing on any
system I use that understands variation selectors even well enough to
ignore them correctly. Sure it will come, but it will take (in my
guess) more like ten years than a year. Even then, there are (big)
circumstances in which it will never come: plain text processing under
older X Window systems. The core font system for the X Window System,
which is now the GUI infrastructure for every Unix system on the
planet, has no difficulty with basic Unicode support (ie. ignoring
rendering niceties). It didn't envisage the supplementary planes, but
there is a very obvious and very trivial workaround which will surely
be blessed by X.org if they ever come to their senses and abandon
their willful neglect of the core font system. (Namely, use the
"registry" iso10646-n for plane (n-1).)
However, there is no way the core font system can ever handle
variation selectors - it just doesn't have the mechanism. It would
have to be hacked in at the application library level using PUA
character positions in the font; which would have to be agreed outwith
Unicode, implemented in libraries, etc. etc.
For the non-VS solution, I can have basic *plain-text* support on ten-year
old systems just by installing updated fonts. For the VS solution,
I'll be lucky if I get support on systems several years in the future,
never mind from ten years ago.
I'm sure the same applies to Windows: what chance has someone running
their nice (relatively) reliable Windows XP system of getting support for
VS-encoded IPA Greek in a few years' time?
(And as a personal issue, my editor of choice certainly doesn't know
about VS (it doesn't even know about unicode properly), and nobody
other than me is going to make it do so;-)
Almost finally, if you go the VS route, surely this form of "pseudo-encoding"
(as Michael put it) needs to have variants for *both* the unambiguous
forms, not just one of them, as seems to be the current
practice. Otherwise you have to scan the whole document for an
occurrence of <χ><VS1> in order to determine whether plain <χ> can be
treated as ambiguous or not.
Really finally, shouldn't the *users* have some input? I see three
classes of people who should be consulted:
* the working linguist. I don't know how you do this (other than by
posting to LINGUIST). I work with the UK's leading linguistics
department, and I could hijack a meeting slot to find out what they
think. That would be anecdotal, but a bit less self-selecting than a
LINGUIST discussion. The full-time linguists on this list could do
so also.
* The IPA. (Of course, they can't be trusted to get things right,
given their record on chi;-)
* People who publish books and databases using the IPA - which
probably means the major academic publishers. Is none of them in the
Unicode Consortium?
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
This archive was generated by hypermail 2.1.5 : Sat Aug 15 2009 - 10:29:28 CDT