Re: TLDs in non-ASCII

Date: Fri May 03 2002 - 01:53:23 EDT

On 04/20/2002 07:07:11 AM Roozbeh Pournader wrote:

>ICANN IDN committee has just published a discussion paper on non-ASCII
>top-level domains. As it appears, it has problems in what is a script,
>what is a language, and what kind of standards are available. I will write
>a thorough reply to the paper, but just wanted to ask people here to help
>recognize their mistakes, both political ones and standard-related ones.

Some comments:

Re geography: An additional consideration / conern in relation to ISO
3166-1 is that these codes are, unfortuately, not as stable as many would

Re languges:

If the objective of IDNs is to enable users to easily type domain names in
familiar, non-ASCII scripts (while preserving universal uniqueness and
resolvability), it might be easiest to simply create a single TLD for each
non-ASCII script, allowing the registry operator to make decisions about
lower-level naming conventions.

As you mentioned, this seems to confuse language identification with script

There does not appear to be a recognized list of all human languages
analogous to the ISO-3166-1 table...

1. Is there any recognized, authoritative reference list of languages
(analogous to ISO 3166-1) that could be employed as a reference against
which to judge proposals for language-associated non-ASCII TLDs?

Depends upon what is meant by "recognised". Several agencies have adopted
the Ethnologue list for similar purposes (cf. There
are some identifier conflicts with ISO 639-2, which could certainly lead to
confusion, though. But the committee should be aware that ISO/TC 37/SC 2
has initiated a process that is intended to lead to extension of the ISO
639 family of standards, and one of the intents is to provide a
comprehensive list of identifiers comparable to that provided by the

                                                                                   Language communities cross sovereign
                                                                                   national boundaries.  The problem of
                                                                                   identifying and achieving consensus
                                                                                   among the stakeholders of a given set of
                                                                                   language communities may be extremely
                                                                                   difficult.  ICANN/IANA might be left
                                                                                   with competing claims backed by
                                                                                   different stakeholders, or, worse,
                                                                                   different national governments.
                                                                                   ICANN/IANA is not well-suited to resolve
                                                                                   those kinds of disputes.


These points are all valid. It definitly would not be a good idea for
ICANN/IANA to introduce yet another list of language IDs. On the other
hand, that probably isn't necessary -- see above.

2. The relevant community of interest for a given language is the set of
all speakers of that language. Is it likewise correct that the relevant
community to be served by a given language-associated non-ASCII TLD would
be the set of all speakers of the languages that utilize the characters
that comprise the proposed language-associated non-ASCII TLD string? If
not, how should ICANN define the community to be served by the manager of a
language-associated TLD?

Yes, though there is an additional complication to be aware of: they are
talking in terms non-ASCII TLDs corresponding to languages, yet many of the
world's languages are written by subsets of the overal language community
using different scripts. It will most generally be true that a given person
will be familiar with only one of these ways of writing the language
(though certainly not always, especially when transitions between one or
the other are occurring). Thus, whereas they are talking of TLDs
corresponding to languages, strictly speaking it probably should be TLDs
that correspond to different writing systems (where a writing system = a
particular system for writing a particular language, roughly language x

Note: the very useful glossary by Paul Hoffman referenced in the doc has a
small but, I thnk, signficant weakness in that it fails to include the
notion of writng system. The distinction between language and script cannot
be properly understood, I think, apart from the intervening notion of
writing system. (Some use orthography for the notion I am describing as
"writing system", which is good in that it makes the important distinction
missing in the TLD doc, though I think there is yet another distinction of
interest for general IT purposes that can reasonably be called

3. Is it correct that a non-ASCII TLD semantically associated with the name
of a language is essentially redundant, given that the domain is by its
very nature an expression of that language?

Not in general. In some cases, there may not be significant ambiguity for a
majority of users; e.g the likelihood of someone mistaking the string
"brother" as being Navajo / Indonesian / whatever rather than English is
pretty slim. But the statement cannot be generalised. A single string can
be used in different languages with different meanings. Thus a TLD of
"chat" wouldn't determine a particular language. Also, for some strings
with a particular semantic, particularly proper names, might not inherently
belong to any single language. Consider that many TLDs involve names of
businesses that would be invariant from language to language (e.g. IBM goes
by "IBM" universally, even if a local office happened to provide content in
one language only).

4. Languages are often spoken across territorial boundaries and under
different and sometimes hostile governments, making the usual requirement
of community consensus extremely difficult to establish or document.

This is a significant consideration. In some cases (I have no information
regarding just how common this is), different portions of a language
community may refer to the language using different names, or may use
different spellings for the language name. Where alternate names exist,
there is potential that the competing names are associated with competing
social / political / religious / ... factions, making the names somewhat
contentious. Moreover, naming preferences can potentially change over time.

That may not imply that such names couldn't be used in TLDs, although using
one of them might imply a potential need to use others as well. Given the
possiblity that a group may have interest in resources for domains that can
be identified as related to their faction and not in others, some might
even desire that. But there is certainly potential for rather greater
management overhead.

5. What role, if any, should by played by governments in the selection of
non-ASCII TLDs semantically associated with languages?

This is also a very sensitive issue. Certainly, governments should have a
role in selecting names for official languages (though there is potential
for conflict given the fact cited in the document that languages span
borders, and given that different names or spellings may be used in
different countries). Beyond that, there are many would would argue in many
cases that governments should specifically *not* be given a role, on the
basis that many governements severely limit or oppose the interests of
non-dominant language communities within their borders. It is probably safe
to make the claim that some governments currently limit the linguistic
freedoms of language communities living within their borders (in some case,
even citizens).

6. What role, if any, should by played by recognized language authorities
(for example, l'Académie française) in the selection of non-ASCII TLDs
semantically associated with languages?

To the extent that such language authorities are often affiliated with
national governments (which provide the source for their authority within a
given country), the statements made under the previous point more or less
apply here. I would think it reasonably non-controversial for such an
agency to make decisions affecting the language for which they are
recognised as authority (though not necessarily: cf. resistence to the
German orthography changes of 1996).

Re cultural groups: the points cited as disadvantages are valid.

1. Is there any recognized, authoritative reference list of cultures and
ethnicities (analogous to ISO 3166-1) that could be employed as a reference
against which to evaluate proposals for non-ASCII TLD strings semantically
linked to them?

No. There are probably attempts at such a list, but the difficulties in
enumerating such distinctions are quite a bit greater than those for
languages, as noted in the doc. I am not aware of any lists that I would
consider candidates in this regard.

I hope this is useful feedback.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

This archive was generated by hypermail 2.1.2 : Fri May 03 2002 - 03:19:17 EDT