All, these are my comments on the Unicode Consortium's Draft UTR#36 document, Revision 1.16 (2005/05/09). They are posted here since this document is in the review radar of the i18n-core wg and apparently I'm not allowed to post to public-i18n-core@w3.org. All in all it is a good document (modulo certain recommendations, IMHO, I'll address that later), but structure is sometimes not respected: For instance, though there's a whole section on IDNs (2.1), IDN issues keep popping up through the rest of the doc. I am sorry if the following list is somehow a mixture of core issues and editorial nits: Section 1, 4th paragraph: "; and according to what you see it is". Is there a piece of sentence missing there? Section 1, 8th paragrpah: "While some browsers prevent this spoof by lowercasing domain names, but others don't". I am not a native speaker, but I guess it should be "domain names, others don't". Section 2.1, 2nd paragraph: It's not actually about IDNs so it shouldn't be placed here. Maybe directly under Section 2. Section 2.1, 3rd paragraph: "using a process called compatibility normalization (NFKC)". I guess that a direct reference to RFC 3491 (Nameprep) would be better placed here, since Nameprep = NFKC + a little bit of something else. Section 2.1, 4th paragraph: ", while the IDNA column shows the IDNA format used to represent the string internally in International Domain Names". First, the term IDNA is here introduced for the first time without further explanation. Second, the column is actually called "IDN Internal", which is an unfortunate name, I was expecting the term ACE ("ASCII Compatible Encoding") to appear somewhere here. The term "International Domain Names" is somehow unfortunate as well (all domain names are an international good ;-), the correct term is "Internationalized". My proposal for this whole sentence is thus: ", while the ACE ("ASCII Compatible Encoding") column shows the result of applying the ToASCII() operation (cf RFC 3490) to the original IDN, which is the way this IDN is stored and queried in the DNS". Section 2.1, 7th paragraph: "The IDN processing also removes case distinctions by performing a case folding to reduce characters to a lowercase form. [...] That means that we can focus on just the lowercase characters". While I don't know whether it will be relevant for the conclusion "we can focus on just lowercase", there are two remarks that must be necessarily made: * First, the IDNA operation ToASCII() will map to lowercase iff the label contains some non-ASCII character. Thus ToASCII("DENIC.DE") = "DENIC.DE", because all ASCII. The IDN processing has left the string unchanged. * Second, domain names are case insensitive, but RFC 1034 and 1035, as clarified by http://www.ietf.org/internet-drafts/draft-ietf-dnsext-insensitive-05.txt, introduce the concept of case preservation. To put it plainly: if I query the DNS for "WWW.DENIC.DE", and the DNS contains information for "www.denic.de", I will get exactly that information delivered, but the answer will be titled "WWW.DENIC.DE". Section 2.1, 9th paragraph: "two domain names would need to be registered". It's a little bit unclear what is meant: Why would that be needed? By whom should the be registered? Since this is not a technical issue, I'd leave this note best left to the recommendations for the user (where it can already be found: 2.10.1.B). Section 2.1, 9th paragraph: The word "registry" appears for the first time without further introduction. For somebody unfamiliar with domain names and the ICANN terminology, it can appear to be unclear. I'd drop anyway the sentence, because the statement "a registry may want to pay attention to this" is more confusing than clarifying. Section 2.1, 10th paragraph: s/international domain names/internationalized domain names/ Section 2.1, 10th paragraph: "the registry can easily determine if a proposed registration conflicts". I'd gently drop the valoration "easily": given an input label of 63 characters (maximal length of a domain name), each of which could be source of an entry in the "confusables" table, and with the assumption that there's always only single target for the same input (is that always the case?), the potential amount of 2^63 lookups in the registration database to be done in realtime in order to work out a possible conflict requires more computing power that most of the world domain registries can afford today. Section 2.1, 11th paragraph: I'd add a fourth bullet "Due to the decentralized nature of DNS, registries do not control subdomains being established beyond the domain name registered". This fact is relevant. Together with problems like the one described in RFC 1535 (and God knows which more to come) this issue could be a door to a new way of scam. Section 2.5, 1st example: "to pretend to be a subdomain in" is not correct. Better: "to pretend to be a URL under the domain" Section 2.5, 1st paragraph after the example: "are disallowed by StringPrep". Stringprep (no capital P) is introduced for the first time without explaining in which way it is relevant to the IDNA standard. I'd actually like to stick to a reference to Nameprep (as introduced before), which -although just a profile of Stringprep- is directly relevant to domain names. Section 2.5, last but one paragraph: "to always visually distinguish the second-level domain". That's a common gotcha: some registries actually register at the third-level (greetings to my nominet.org.uk colleagues from here :-), and there's no rule that forbids a TLD to register at the fourth, fifth.. you just can't carve the second-level in stone. Section 2.8: Actually very difficult itself to understand for a non-native speaker. But since I didn't get it, I can't make any suggestion for improvement. Somehow there are a lot of pronouns "this", "both", ... for which I can't univocally found the reference. Section 2.9: The security levels are a good idea, the names are problematic though. I wouldn't like to claim that my registry assigns domain names at Unicode's "security level minimal", though it's supposed to be the second highest in the rank :-). Further: what is the "minimal +" or "moderate +" supposed to mean? Please clarify. Section 2.9, 1st paragraph after the security levels: "characters outside of XID_Continue". This can't be unterstood by non-insiders. Please clarify. Section 2.9, 2nd paragraph after the security levels: That is probably well-meant, but I wonder whether that suggestions wouldn't be best left to usability experts. Section 2.10: The recommendations are too domain-centric, I would have expected to see recommendations for identifiers here. Section 2.10.1, point A: s/browsers/browsers, mail clients and software in general/ Section 2.10.1, point B: "Use the same IP address for both". This recommendation bases on the belief, that a registered domain name always has an IP address (and promulgates that the Internet is the web), but that's not always the case: it could be a domain with only MX records (for mail exchange), it even could be a domain which is blocked at the registry (and thus can't be found in DNS). But even if all domains would have an IP address and a webserver running, I find this a bad recommendation: maybe I'd like that my, let's call whole-script confusable domains, point to another website with a different message from the original one. Section labelled "General Programmer Recommendations": incorrectly numbered as 2.10.1. Correct following sections, too. Section 2.10.2, point B.3: "display the domain name with a visually highlighted domain name". Unintelligible. Section 2.10.2, point C.1: "excluding the TLD". Please, don't carve in stone that TLDs won't contain characters beyond ASCII in the future. Section 2.10.2, point D.2: "If the domain has a whole-script confusable, verify that both point to the same IP address". While displacing this requirement from the registry to the user agent would be an improvement towards leveraging the end-to-end design principle of the Internet, how should that be practically performed? The client calculating 2^63 label permutations and afterwards issuing that amount of DNS queries? Not practicable, also consider the previous comments on 2.10.1.B. Please drop this. Section 2.10.3: Strange. The "User recommendations" in section 2.10.1 give the impression that this document is encouraging the user (here: domain name registrant) to take responsibility for the protection of their trademark rights/IPR/security of their domains/etc. I would embrace that. And so was the previous version 2 of UTR#36. But suddenly this new draft gives an inconsistent twist with itself and includes these new points B.2 and B.3. Frankly: I don't think it's the task of a domain registry to check whether certain domain names belong to the same registrants. Rules which recommend that the domains "111.com" and "lll.com" (and "11l.com", and "1l1.com", etc.) should belong to the same person haven't been followed in the ASCII times and are not programmed to success in the advent of IDN. More input from the TLD registry community would be needed here. My 0.02 Euros. Marcos Sanz DENIC eG