Re: the Ethnologue

From: Doug Ewell (
Date: Thu Sep 21 2000 - 11:52:22 EDT

Hi Peter,

> The records in the text file you looked at are language-countries. It
> is important to understand that the categorization is not reflected
> by the records in that file, but by the three-letter codes. The
> reason for codes being duplicated is because the languages in
> question are spoken in more than one country.

I definitely would not have guessed that. There generally are no
country indicators for most languages (creoles and pidgins being a
noteworthy exception), and while it is possible that no languages are
spoken in as many countries as are English and Spanish -- each with
considerable country-specific differences -- there are only a few
separate entries for those two (cf. Mixteco and Zapoteco).

(Yes, I know the Ethnologue's emphasis is on categorizing and
documenting minority languages. I know one of the main criticisms of
ISO 639 is that it provides support only for relatively major languages
at the expense of minority languages, but it is possible to err in the
other direction as well.)

> A flat-file database was originally used because the database dates
> back to before the advent of relational databases. Work has begun to
> get the data into a relational structure. Once that is done, it will
> be possible to view the data in other ways, including directly by
> language.

That will certainly make it easier for non-SILers like me to figure out
what is intended, and will reduce misunderstandings.

> There is no, single right way to "tile the plane".
(repeated several times in different messages)

Agreed. This is a refreshing departure from the position I perceived
earlier, that ISO 639 was severely broken and the Ethnologue approach
was inherently superior. The truth, of course, is that each approach
has its advantages and drawbacks for language tagging. 639 needs more
codes (and we know the MA's are working on this), and Ethnologue needs,
if not fixing, at least clarifying.

> A universally "politically correct" name in every case is insoluable.
> Simply picking on as a default *for the purposes of implementation of
> the system of identifiers* is reasonable, and is a problem we have to
> be able to solve if we are going to present a view of the data that
> is organised first by language - at the least, you have to list one
> name first. This is certainly going to happen.

That is all I was asking for. I apologize if it sounded otherwise.

> Ethnologue can supplement ISO codes, but we're not suggesting simply
> adding all the Ethnologue codes to the same namespace. That would not
> work. On the other hand, "i-sil-xxx" would. It is also necessary to
> ensure that, if the category denoted by an instance of "i-sil-xxx"
> matches that of some ISO code, then only the ISO code should be used.
> To deal with this, a mapping between ISO and Ethnologue is needed,
> and that is being worked on.

That is a real solution, one that builds on ISO 639 instead of bashing


-Doug Ewell
 Fullerton, California

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT