>The other side of this issue is coding ambiguity. Say you have
some African language which uses an IPA-influenced orthography,
will you use LATTIN SMALL LETTER A or your new homoglyph LATIN
SMALL LETTER A WITH HOOK here?
>I believe, the conclusion is that we should not think in terms
of being able to add IPA highly consistently to every font
there is. Only a few font styles are really useful for being
extended into good IPA fonts, so if you write dictionaries,
linguistic textbooks, etc., you should make sure you use one of
these font styles. Do not expect that every Unicode font will
contain every Unicode character in high quality. Unicode should
be more seen as a scheme to encode characters, not as a
repertoire that from now on every font has to cover entirely.
I agree that we probably don't want every font to be used for
IPA. But there still is an issue of encoding ambiguity when
dealing with plain text. Perhaps the answer, though, is that,
strictly speaking, plain text is effectively meaningless.
Knowing the encoding tells you how to get one level of
semantics, i.e. how to translate the bytes into abstract
characters, but you still don't know what the sequence of
characters mean in terms of any human language until the
language is identified. If you get a plaintext file and it
contains
"See Dick run."
Then you'll make an assumption about the intended language, and
that assumption will probably be valid. But it's an assumtion
nontheless. When there is real potential ambiguity, there is no
recourse but to provide some markup:
<blahurg>See Dick run.</blahurg>
(undoubtedly means something derogatory about the listener's
grandmother). If the plaintext happens to mix text in IPA and
text a language that uses U+0061, then if there is confusion it
may be necessary to have markup along the lines of
<eng>The Blahurg word for ... pronounced, "<ipa> ...a...
</ipa>", and means ... upset.</eng>
Of course, I probably wouldn't complain if there was a separate
character LATIN IPA SMALL LETTER A that disambiguated this for
plain text. (Nobody should be confused about the purpose of a
character with such a name.) Ditto for other cases.
>For every font style, there are Unicode characters that will
not go well with it. High-quality fonts will therefore always
be Unicode subsets only, and applications such as Web browsers
who can prevent certain characters from being used in certain
style contexts will brutally fall-back to other styles (e.g.,
pick math operators from the upright font even inside italic
text).
So let it be written; so let it be done.
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT