Re: IPA a vowels

From: Robert Brady (robert@ents.susu.soton.ac.uk)
Date: Fri Sep 10 1999 - 10:29:54 EDT


On Fri, 10 Sep 1999, Michael Everson wrote:

> >IPA has got a LATIN SMALL LETTER SCRIPT G as well. The corresponding
> >character, LATIN SMALL LETTER NONSCRIPT G, (or similar) has been unified
> >with LATIN SMALL LETTER G, in a similar demonstration of brokenness :
> >where two glyphs that are used of variants of one grapheme, and used
> >contrastitively in other situations, have been encoded as two code points,
> >rather than the needed three.
>
> By the same token we'd need three for I, DOTLESS I, and DOTTED I for
> Turkish. But no one wants to do this. (Probably because huge amounts of
> Turkish data do something else.)

Yes. That would be needed, but the case is not as strong, as <dotless-i>
is not a common glyph variant of <dotted-i>, and the other purpose, that
of caseshifting, has to be locale-dependent and nontrivial anyway...

> I support the last three because of the functionality issue of sorting
> Greek and IPA text (in Latin transcription Beta is not supposed to sort
> after z, which it does with the unification). I think the case for the
> first two is dodgy. What do you do about inputting?

Inputting for <hooked-a> would presumably be done in the same way that
inputting <round-a> is currently done. If people still insist on inputting
<a>, then there's nothing that can be done about that, but we should
preferably give people the ability to do things in a sensible way.

> Right now my IPA fonts have a on a and round-a on A.

Yes, this issue can be solved with tagging or markup, but it cannot be
solved that way not in true plaintext. If I extend a font which uses the
<round-a> glyph to cover the IPA range, I have to either

  a) make the <a> identical to the <round-a>. This is unacceptable.

  b) distinguish between the <round-a> and <a> whilst leaving the <a>
     unchanged. This will be confusing to users, who currently expect to
     see a <hooked-a> glyph for <a> in IPA contexts. This is unacceptable.

  c) make the <a> into a <hooked-a>, and make <round-a> a <round-a>. This
     is not something I am happy about, especially if I am trying to keep
     the original style of the font intact. It alters the appearance of
     normal plaintext to fix IPA text. I am not comfortable with doing
     this.

I would like the following option to exist.

  d) leave <a> as it is, and make <round-a> and the new <hooked-a>
     character appropriate. This has no drawbacks, apart from the
     conversion of legacy data.
     
     However, I think it is better that it be fixed now, whilst there
     is still not much legacy data. Waiting will only increase the
     impracticality of this change.

     A similar case can be made for Beta, Theta, and Chi. Another point,
     that you not mention, is that of approximate conversions...

     If I were writing a filter to convert Unicode to <any-arbritrary
     codeset>, with a 'best possible' representation, I would want to
     convert greek theta to "th". However, I would not want to convert IPA
     theta to "th" : I would want to convert it to "T". This is because "T"
     is the ASCII-IPA sign for the voiceless interdental median fricative.
     Ditto with "beta" which would want to be converted to either "b/v" or
     "B", and "chi" which would want to be converted to "ch/kh" for greek,
     or "X" for ASCII-IPA. (This works, as IPA does not use fullsize
     uppercase Latin letters)

-- 
Robert



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT