>We have said this many times before. Kurdish is written in
Cyrillic in some places. Kurdish is written in Latin in other
places. Same language. You can't sort a multiscript list of
Kurdish words if you are making Q and W serve as letters of two
different scripts. Never mind English and Kurdish. The example
is Kurdish and Kurdish.
We all know that there are several languages that are written
using more than one script. As Michael points out, if you're
sorting Kurdish words written in different writing systems, the
Q and W will be ambiguous, even if strings are language tagged
(as was suggested in another message).
What may be less familiar to some is that there are cases where
a language is written with more than one writing system, but
the various writing systems are based on a single script. It is
not that uncommon for a minority linguistic group to have
competing orthographies while in a early literacy,
pre-standardisation stage. In fact, even if there is a single
effort within a community to establish an orthography, there
may be several proposed orthographies that are being considered
and tested.
In our language software, we have concluded that all strings
need to be tagged not only for language, but also for writing
system. This permits us to handle several different cases:
- multiple standard orthographies based on differing scripts
- multiple pre-standard (prototype) orthographies based on a
single script
- both "practical orthography" (orthography in the true sense)
and "technical orthography" (phonetic/phonemic transcription)
If strings are all tagged for writing system, then that would
provide a solution to the problem Michael presents. I doubt,
however, that most implementers would want to support all of
the infrastructural mechanisms that we need in our software.
The only alternatives to tagging for writing system are
- dis-unify Cyrillic and Latin Q, W
- live with the ambiguity for Kurdish (potentially other
languages now or in the future) of the current situation
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT