Re: the Ethnologue

Date: Wed Sep 13 2000 - 17:11:23 EDT

On 09/13/2000 10:25:21 AM Antoine Leca wrote:

>While I agree with you, there are anyway problems with the way languages
>are distinguished...

Some comments in response:

- This is not primarily about major languages. They generally already have
the identifiers they need. In addition, because of their history of
literary tradition together with subsequent sociolinguistic change and
diversification, they present complications that are not the norm when
considered in relation to thousands of lesser known languages. The aim of
adding thousands of new language identifiers to some standard system is
focused on the thousands of languages that currently have nothing, not to
replace what is already there for the few hundred that are already covered.

- There is no question that some processes require distinctions based on
one or another type of *paralinguistic* notion, such as writing system or
orthographic convention. My guess is that that these distinctions are most
different from a simple enumeration of languages (based on a given
operational definition) exactly in the cases mentioned above. Further
understanding is needed of what processes depend upon distinctions based on
which paralinguistic notions, but that is likely to take quite some time
yet. In the mean time, the needs of those interested in the thousands of
lesser-known languages that have *nothing* in the way of identifiers
shouldn't be neglected. We can improve our systems as we understand the
needs of different processes better. When we get to that point, it is
likely that a comprehensive enumeration of languages will be much more of
an assistance rather than a hindrance.

>I have no firm idea for what should be the form of a list of languages.
>But I am _sure_ that any list will lead to problems, due to the fuzziness
>of the borders between languages.

That is precisely because there is no *one, perfect* enumeration of
languages since alternate categorizations based on different operational
definitions may be valid for different purposes. (All points that Gary and
I have made in our paper.) The challenge then is to find a way to provide
different users with differing purposes solutions that suit their purposes.
Our suggestion of alternate namespaces of identifiers permits exactly this.

>And while this problem is more or less
>possible to deal with when it comes to the major languages with abundant
>literature and standardized spelling, at the very time it narrows to
>used languages, problems will arise.

Actually, in some respects it is major languages that create some
complications that don't apply to lesser-known languages. (Thus some of
your comments.) On the other hand, it is not clear that an attempt to adopt
a comprehensive enumeration of languages will lead to many more problems.
There will *always* be somebody who says they need something different. On
the other hand, if we use the Ethnologue to add coverage for lesser-known
languages to existing systems, many users interested in modern languages
will feel they are a lot closer to what they need. (Those interested in
ancient languages will not have their needs met, but that is beyond SIL's
expertise.) There will still be occasional dissatisfaction, but not the
wholesale frustration that currently exists.

>The problem you mentioned with the incorrect tagging of Hopi is inherent
>any persistent use of an information that uses a varying database.
>If Ethnologue is merged with (or into) ISO 639, this problem won't fade
>because the linguistic map of the planet is alive (not to mention
>pressures like what I spoke about Valencian above). So if CLN (I am sorry,
>I do not know Hopi's situation, so I cannot comment on your specific
>if CLN is split, with a special code for Valencian created, then this very
>day all literature in Valencian would be *now* incorrectly tagged. Exactly
>the same case as you described above. The same, except for one point: the
>number of documents that might be affected...

This is precisely my point: people object to the Ethnologue because the
information is incomplete and therefore subject to change, but they assume
that ISO 639 is free of criticism in this regard. That is not true, since
ISO 639 is subject to the same problems. In fact, there is much less of a
problem if a comprehensive list of identifiers based on the Ethnologue were
available for two reasons:

1. The Ethnologue will record change history, and any changes would be from
one *known* quantity to another. Hypothetical example: the data is tagged
as "Lahu Shi", but now we know that, 3 years after the data was created, it
was learned that this corresponds to two distinct languages. The data
become sub-optimally tagged, not completely incorrectly tagged.
Furthermore, even though we may not know precisely how the data should be
tagged based on the new knowledge, we are able to determine the maximal
extent to which it is sub-optimally tagged.

In contrast, with ISO 639, the data is tagged as a largely unknown quantity
- in the example, "Sino-Tibetan (other)", and when the system is updated to
add a specific tag based on new knowledge, then the existing data is
incorrectly tagged, and still as a largely unknown quantity. Not only do we
not have any way to know to what extent it is incorrectly tagged, we in
fact don't even have any way to determine that it *is* incorrectly tagged.
(I'm discovering that the problem is worse than I realised every time I
explain it.)

2. The amount of data that can be effected by changes in the Ethnologue is
relatively small. E.g. a change in knowledge about Lahu Shi only affects
documents for that one speech variety. In contrast, changes related to the
ISO 639 code sit "Sino-Tibetan (other)" may potentially affect a much
larger volume of data because many languages are involved.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT