Re: Names of languages each expressed in their own language

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Aug 07 2001 - 20:24:06 EDT


William Wolverington suggested:

> I wonder if there already exists, or could we devise, a list of the names of
> languages each expressed in their own language please.
>
> It would be helpful if the Unicode Consortium might kindly include such a
> list on its website, as that would then give the list considerable
> provenance for accuracy.

While it would be nice to have such a list easily available, I am sure
that the Unicode Consortium is not the right body to develop it, nor
the right website to post it.

Perhaps a very short subset of the potential list might be a useful
adjunct to the website of one of the major internationalization/
localization services companies, listing those languages that might
be most likely to be of widespread commercial significance for
translation, and hence for language menu choices.

> I am aware that there may be some international standard way of denoting
> languages using perhaps three latin characters. I am unaware of the
> details. It would be helpful to include such information in the entries in
> the list.

There is such an international standard: ISO 639. It is woefully
incomplete, but it *is* a standard. (Or rather a couple of standards
in uneasy coexistence.) A list of the ISO 639 two-letter codes, together
with their crossmappings to Microsoft Windows and Macintosh language
codes has recently been updated, and is available on the Unicode website:

http://www.unicode.org/unicode/onlinedat/languages.html

The ISO 639 three-letter codes are also easily accessible online now:

http://www.loc.gov/standards/iso639-2/

That listing, by the way, gives both English and French names of
all the languages (but not Chinese, Japanese, Amharic, Albanian,
Zulu, .... etc., names of all the languages ;-) )

See The Ethnologue online for the best current listing of *all* the
living languages of the world, together with another set of three-letter
codes:

http://www.ethnologue.com/

Peter Constable suggested somewhat tongue-in-cheek:

> >If you want to have a list of all languages in all languages you might
> also
> >consider all countries in all languages as well if you are picking
> locales.
>
> Don't you really want all language names in all writing systems? The number
> of known living languages is 6800+. Fortunately for you, less than half are
> written, but some languages have multiple writing systems corresponding to
> different scripts. Let's say there are 2500 writing systems (probably not
> far off). That's 6800 x 2500 = 17,000,000. Are you really sure you want a
> list that long? The page would take a while to load. :-)

The numerosity is not as bad as all that, since, of course, most languages
don't have terms for most other languages.

However, the problem is enormously complex. In addition to the 6800+
living languages Peter mentions, there are also all the major and
minor extinct languages, each of which has at least some technical name
and maybe many other names in many other languages, some of which themselves
are extinct, of course. Among some of those we know, for example, would
be things like phrúgios (written in Greek of course), which would be
the name in Classical Greek (extinct) of Phrygian (also extinct).

Then, no one can really tell "dialect" apart from "language", so you
end up with all the dialect names as well, and would have to sift
through that mass to figure out what to list.

Then there is the problem of just what a "language name" is in the
first place. This is anthropologically and sociologically tricky.
Many small aboriginal groups may not have had a "name" for their
language in the same sense that taxonomizing Europeans tended to
favor. What I speak may just be know as the "speech of the people"
or some such, and opposed to the "speech of hot-springs-village"
and the "speech of river-fork-village" and so on, referring to
groups around you by their village names or other geographic
references. Are those "language names"? Often such terms or pieces
of them get picked up by an anthropologist and are then asserted
to be the "name" of the language. Example: Wintu, a language
in North Central California: the word "wintu" just means "a person"
in Wintu, and wasn't used in the way we would use "English" for
designating a language, although you could translate "what people
speak" using it. In other instances a language name picked up by
somebody is actually somebody else's pejorative name for some
other group. The name sticks, but it isn't what the group itself
might use for itself (it's autonym).

In any case, as for nearly everything having to do with *language*
classification, as opposed to *character* classification, this whole
area is a black hole of effort that is essentially outside the
charter of the Unicode Consortium, in my opinion.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Aug 07 2001 - 21:34:16 EDT