From: Erik van der Poel (erik@vanderpoel.org)
Date: Fri Feb 18 2005 - 07:41:45 CST
All,
This email is being sent to both the Unicode and IDN mailing lists. I'm
wondering how we can move forward with the IDN spoofing issue. Let me
take a stab at it.
Regarding the proposal to unify (or map) all the homographs, Doug Ewell
wrote a humorous email illustrating how difficult such an effort would be:
http://ops.ietf.org/lists/idn/idn.2002/msg00498.html
John Klensin says that a "one label, one language" rule has been
suggested to combat look-alike confusion. See section 1.5.1 in:
http://www.ietf.org/internet-drafts/draft-klensin-reg-guidelines-06.txt
Indeed, this label-based idea makes sense because DNS is
administratively divided into labels. For example, the .com operator
might be able to impose some restrictions on the 2nd level domain label,
but if someone registers foo.com, then it's up to them to decide what
will be allowed at the 3rd level (e.g. bar.foo.com). No?
Recent discussion on the IDN mailing list has suggested that we might
want to think more in terms of *script* than language. However, I note
that there is a very diverse history of mixing scripts:
http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_mixing.html
But do we really need to allow for such rich script mixing in DNS? Some
of the script mixing described in the document above is scholarly
transliteration or "one-offs".
So, instead, I propose that we start thinking of a "one label, one
writing system" rule. The Unicode book defines "writing system" as "a
set of rules for using one or more scripts to write a particular language".
This makes a lot of sense for some of the ccTLDs. For example, the .jp
domain could choose to allow the Japanese writing system in the 2nd
level domain label.
But what can we do about .com? It's clearly a worldwide TLD now. It
should probably allow multiple writing systems. Perhaps the .com
operator could specify that 2nd level domain labels must stick to one
writing system, and that that writing system must be indicated in the
RRP (Registry Registrar Protocol) in order to validate the 2nd level
name against the table of characters allowed in that writing system.
This would probably require a (new?) set of names for writing systems,
somewhat similar to the language tags of ISO 639.
Some people might point out that it is unfair to impose a writing system
rule on domain labels since DNS has not had such restrictions in the
past. Or has it? The DNS spec itself may allow various octet values, but
the infrastructure and conventions appear to be restricted to some of
the ASCII characters, which I guess you could just call the English
writing system, no?
Also, I'm guessing that any "one label, one writing system" rule cannot
really be mandated, since TLD operators have historically been free to
do whatever they want, to make as much money as they want. So this rule
would just be a guideline (Klensin's document is titled "Suggested
Practices ...") and the TLD operators could follow it, if they wish to
combat the IDN spoofing problem more than they wish to make money (in
the short term :-)
Erik
This archive was generated by hypermail 2.1.5 : Fri Feb 18 2005 - 07:42:51 CST