Re: languages that need Unicode

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Fri Mar 10 2000 - 07:58:20 EST


Edmund GRIMLEY EVANS wrote on 2000-03-10 11:20 UTC:
> Does anyone have a list of languages for which the only reasonably
> adequate and reasonably well-established encoding is Unicode?

Ethiopian is probably on that list, as are many other African languages
covered by Latin Extension B (unless you count ISO 6438 as
well-established ;-) and all the languages for which support was added
with Unicode 3.0. Also Tibetian and Lao, as well as Russian before the
20th century. Some will also argue that Korean is far better represented
by Unicode than by the old KS C 5601. Add to that the International
Phonetic Alphabet, APL, mathematics, and the many Persian languages that
are not covered by ISO 8859-6. Polytonic Greek is also only widely
encoded in Unicode.

You can also argue quite convincingly that for any language not
supported by CP1252, ISO 8859, JIS X 0208, KS C 5601, GB 2312-80, Big5,
and ISCII there is no "well-established" encoding apart from Unicode,
because any other formally existing encodings are very rarely supported
by standard products today. For instance, there are over a dozen "ISO
coded character sets for bibliographic information exchange" that
together also cover most languages that Unicode covers, but these are
only implemented in highly specialized systems and are quickly being
replaced by Unicode.

"Reasonably adequate" is of course a matter of personal preferences.
There are surely fanatics, who will tell you that German and English are
only "reasonably adequately" representable in Unicode, as other coded
character sets lack the long-s, which was rather widely used in Isaac
Newton's time, while most others consider CP1252 to be quite appropriate
for both languages.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT