At 03:15 -0700 7/4/1999, Michael Everson wrote:
>Ar 01:32 -0700 1999-07-04, scríobh Edward Cherlin:
>
>>My question can be put another way. Should IPA characters be mapped
>>one-to-one into Unicode characters, or should IPA be considered a different
>>data type?
>
><astonishment>What on earth?</astonishment>
>
>IPA is already a part of the UCS.
You mean we can't rethink the situation under changed circumstances? If IPA
can be extended to a closed set that will never ever need further
extensions, then my *opinion* is that it should all be in Unicode. If it
requires the possibility of arbitrarily large future extensions, then it is
equally my *opinion* that Unicode should have enough of its base
characters, and Unicode fonts should have enough glyphs, to render them
all, but that the entities should not in that case be made into Unicode
characters, and that an IPA text datatype should be defined by the
linguistics community as a separate standard.
(And I hold exactly the same opinions about math extensions. Asmus assures
us that the plan is to add about a thousand characters and then stop. I
have doubts about that, which we could discuss.)
>>Is it just a writing system like all other writing systems or
>>should the process which goes from a sequence of IPA entities to a graphic
>>image (on screen, paper, or whatever) include a translation from IPA to
>>formatted Unicode characters, thus decoupling the problem of defining IPA
>>entities from the problem of defining the mapping from formatted character
>>strings to lists of glyph/position pairs for rendering them.
>
>IPA uses the Latin alphabet and various diacritics. Unfortunately in my
>view there are some Greek letters which haven't been "cloned" into Latin
>for IPA support, which I guess will make sorting of Greek & IPA data
>impossible. That's the only problem with IPA in the UCS.
 ^^^^^^^^^^
Nonsense.
If IPA and Greek are to be mixed, but remain distinguishable, you will have
to use markup, just as if you had mixed Greek and Coptic. Then you can sort
them any way you like. If you want uniform, portable sorting methods for
text with markup, you have to consider whether XML can do what you want.
Unicode cannot carry the burden of all possible semantics for a particular
character. We cannot do a correct linguistic sort on Unicode plain text
with no language markers, and the proposed set of language marker
characters cannot cover the requirements for 6,700 languages (current
Ethnologue count) in more than 200 writing systems plus IPA.
The most fun cases are languages written by their users of different
cultures or at different times in two or more scripts, including, but not
limited to,
Michael's example of Kurdish (Arabic, Cyrillic, Latin)
Serbo-Croatian (Christians and Muslims, Latin alphabet; Serbs, Cyrillic)
Hindi/Urdu (Hindus, Devanagari; Muslims, Arabic)
Egyptian/Coptic (Pre-Christian, Hieroglyphic and Demotic; Christians, Greek)
Mongolian (Soviets--Cyrillic; before and after Soviet period, Mongol script)
Tajik, Uzbek, Kazakh, Azeri (Cyrillic, Latin, Arabic)
Turkish (Arabic, Latin)
Swahili (Arabic, Latin)
Pali (Sinhala, Devanagari, Thai)
Buddhist Hybrid Sanskrit (Devanagari, Tibetan, Chinese)
Japanese (Chinese, Latin, Hiragana, Katakana *mixed*)
Korean (Chinese, Hangul, Latin *mixed*)
APL (proprietary APL/ASCII character encodings; Unicode APL (U+2336-237A,
math, ASCII *mixed*)
Not counting Romanizations, Cyrillicizations (?), and other transcriptions,
Shavian, or Bertrand Russell's diary (English language in Greek script).
>--
>Michael Everson * Everson Gunn Teoranta * http://www.indigo.ie/egt
>15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
>Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
>27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire
-- Edward Cherlin edward.cherlin.sy.67@aya.yale.edu "It isn't what you don't know that hurts you, it's what you know that ain't so."--Mark Twain, or else some other prominent 19th century humorist and wit
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT