Re: IPA a vowels

From: peter_constable@sil.org
Date: Fri Sep 10 1999 - 11:03:21 EDT


>The other side of this issue is coding ambiguity. Say you have
       some African language which uses an IPA-influenced orthography,
       will you use LATTIN SMALL LETTER A or your new homoglyph LATIN
       SMALL LETTER A WITH HOOK here?

>I believe, the conclusion is that we should not think in terms
       of being able to add IPA highly consistently to every font
       there is. Only a few font styles are really useful for being
       extended into good IPA fonts, so if you write dictionaries,
       linguistic textbooks, etc., you should make sure you use one of
       these font styles. Do not expect that every Unicode font will
       contain every Unicode character in high quality. Unicode should
       be more seen as a scheme to encode characters, not as a
       repertoire that from now on every font has to cover entirely.

       I agree that we probably don't want every font to be used for
       IPA. But there still is an issue of encoding ambiguity when
       dealing with plain text. Perhaps the answer, though, is that,
       strictly speaking, plain text is effectively meaningless.
       Knowing the encoding tells you how to get one level of
       semantics, i.e. how to translate the bytes into abstract
       characters, but you still don't know what the sequence of
       characters mean in terms of any human language until the
       language is identified. If you get a plaintext file and it
       contains

       "See Dick run."

       Then you'll make an assumption about the intended language, and
       that assumption will probably be valid. But it's an assumtion
       nontheless. When there is real potential ambiguity, there is no
       recourse but to provide some markup:

       <blahurg>See Dick run.</blahurg>

       (undoubtedly means something derogatory about the listener's
       grandmother). If the plaintext happens to mix text in IPA and
       text a language that uses U+0061, then if there is confusion it
       may be necessary to have markup along the lines of

       <eng>The Blahurg word for ... pronounced, "<ipa> ...a...
       </ipa>", and means ... upset.</eng>

       Of course, I probably wouldn't complain if there was a separate
       character LATIN IPA SMALL LETTER A that disambiguated this for
       plain text. (Nobody should be confused about the purpose of a
       character with such a name.) Ditto for other cases.

>For every font style, there are Unicode characters that will
       not go well with it. High-quality fonts will therefore always
       be Unicode subsets only, and applications such as Web browsers
       who can prevent certain characters from being used in certain
       style contexts will brutally fall-back to other styles (e.g.,
       pick math operators from the upright font even inside italic
       text).

       So let it be written; so let it be done.

       Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT