From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Nov 22 2010 - 12:55:58 CST
On 11/22/2010 4:15 AM, Michael Everson wrote:
>> It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system,
> Yes there are. Sorting multilingual text including Greek and IPA transcriptions, for one. The glyph shape for IPA beta is practically unknown in Greek. Latin capital Chi is not the same as Greek capital chi.
>
>> >  so also there are no technical or usability reasons I’m aware of why it is problematic to represent this historic Janalif orthography using two Cyrillic characters.
> They are the same technical and usability reasons which led to the disunification of Cyrillic Ԛ and Ԝ from Latin Q and W.
The sorting problem I think I understand.
Because scripts are kept together in sorting, when you have a mixed 
script list, you normally overrides just the sorting for the script to 
which the (sort-)language belongs. A mixed French-Russian list would use 
French ordering for the Latin characters, but the Russian words would 
all appear together (and be sorted according to some generic sort order 
for Cyrillic characters - except that for a bilingual list, sorting the 
Cyrillic according to Russian rules might also make sense.).
Same for a French-Greek list. The Greek characters will be together and 
sorted either by a generic Greek (script) sort, or a specific Greek 
(language) sort.When you sort a mixed list of IPA and Greek, the beta 
and chi will now sort with the Latin characters, in whatever sort order 
applies for IPA. That means the order of all Greek words in the list 
will get messed up. It will neither be a generic Greek (script) sort, 
nor a specific Greek (language) sort, because you can't tailor the same 
characters two different ways in the same sort.
That's the problem I understand is behind the issue with the Kurdish Q 
and W, and with the character pair proposed for disunification for Janalif.
Perhaps, it seems, there are some technical problems that would make the 
support for such "mixed-script" orthographies not as seamless as for 
regular orthographies after all.
In that case, a decision would boil down to whether these technical 
issues are significant enough (given the usage).
In other words, it becomes a cost-benefit analysis. Duplication of 
characters (except where their glyphs have acquired a different 
appearance in the other context) always has a cost in added 
confusability. Users can select the wrong character accidentally, 
spoofers can do so intentionally to try to cause harm. But Unicode was 
never just a list of distinct glyphs, so duplication between Latin and 
Greek, or Latin and Cyrillic is already widespread, especially among the 
capitals.
Unlike what Michael claims for IPA, the Janalif characters don't seem to 
have a very different appearance, so there would not be any technical or 
usability issue there. Minor glyph variations can be handled by standard 
technologies, like OpenType, as long as the overall appearance remains 
legible should language binding of a text have gotten lost.
That seems to be true for IPA as well - because already, if you use the 
font binding for IPA, your a's and g's will not come out right, which 
means you don't even have to worry about betas and chis.
IPA being a notation, I would not be surprised to learn that mixed lists 
with both IPA and other terms are a rare thing. But for Janalif it would 
seem that mixed Janalif/Cyrillic lists would be rather common, relative 
to the size of the corpus, even if its a dead (or currently out of use) 
orthography.
I'd like to see this addressed a bit more in detail by those who support 
the decision to keep the borrowed characters unified.
A./
This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 12:59:17 CST