From: Hans Aberg (haberg@math.su.se)
Date: Tue Feb 22 2005 - 12:44:48 CST
At 13:33 +1100 2005/02/22, George W Gerrity wrote:
>it doesn't make sense for these rules to be part of a standard on how to extend
>Domain names to use scripts other than Latin: they are much better handled as
>(algorithmic where possible) regulations specified by the authority for a given
>TLD, or set of TLDs, in the case of the universal TLDs.
It seems simplest to merely require the names to be 8-bit bytes, UTF-8
encoded.
>At the TLD itself, one can allow a limited, but finite number of character
>strings to be equivalent, including the rule that script mixtures are
>inadmissable, but maybe case folding will be allowed.
Then if DNS name lookup software is not updated, only ASCII cases will be
identified, as before, but no other casings, not even for Latin script
letters with diacritical marks. (In retrospect, when facing the full Unicode
set, it might have been better to identify ASCII letter cases.)
>...it doesn't make sense for these rules to be part of a standard on how to
>extend Domain names to use scripts other than Latin: they are much better
>handled as (algorithmic where possible) regulations specified by the authority
>for a given TLD, or set of TLDs, in the case of the universal TLDs.
Then all confusable problems will be handled at the registry.
>By using this approach, and starting off with a set of rules that disallow most
>forms of script mixes, then where appeals to common sense and the wishes of a
>reasonable number of potential clients suggest a loosening of the rules, this
>can be done with little disruption to the existing state of affairs.
If one uses the method I indicated to define equivalences, then script mixes
can be allowed. If cases are not identified in scripts, then these
equivalence will be between characters of different scripts. Thus, they
should not cut down on manuscript names. (I want to avoid throwing in
general equivalences such as that of casings, as different equivalences can
combine to generate unwanted equivalence chains.)
>The problems for universal TLDs (<.com>, <.net>) are far more complex, because
>they are required to accept all language scripts.
If all language scripts are already decided admissable on these levels,
these will be the battleground for confusables. So there might not be a
point in restricting other levels. One should also note that the country
codes are not language or codes indicating scripts, and most nations are
multilingual today. It might be constroversal to restrict country codes to
just certain scripts.
>c) At this point, the <.com.ru> registrar will need to exercise some common
>sense. For instance, it seems unreasonable that this domain should accept codes
>outside the Latin and Cyrillic code blocks, and if they do, then mixes should
>be strongly discouraged. Certainly, the use of, say, Hebrew vowel pointing with
>Latin Codes, while perhaps acceptable in Israel TLD, should be unacceptable in
>the Russia TLD. In fact, as a general rule, mixes of diacritics from one code
>block with code points from another, should never be allowed.
So this assumes that there are no Hebrews in Russia. This restriction might
be interpreted politically as those speaking Hebrew in Russia should go to
Israel, at least as far as defining their Internet domain names goes. It
might be wise to avoid this kind of political controversy. :-)
I think one can define a lot of homograph equivalences, which is then used
only for an automated first check when attempting to register a new name.
The cases that fail to register automatically will become reviewed by a
human. One will then discover if one has defined too many equivalences. It
might be wise to set up a report system, where the public can report
confusable names. Then a committee will have to review those cases, and
decide what to do about them.
(I also like the idea that sites that use a non-ASCII name must register a
parallel ASCII name, for international access: It might be difficult to make
proper control of sites if one has to be an expert on International scripts
in order to access them. One easy way for a criminal to "hide away" a site
might otherwise to give it a strange name.)
Hans Aberg
This archive was generated by hypermail 2.1.5 : Tue Feb 22 2005 - 15:32:21 CST