From: Mark Davis ☕ (mark@macchiato.com)
Date: Sun Sep 19 2010 - 18:26:58 CDT
Thanks for checking the data. I'm sorry for not responding earlier; I was on
vacation, and am now working through my backlog of email.
Some of the differences are because UTS#46 provides a compatibility 'bridge'
between IDNA2003 and IDNA2008. For details of these particular cases, see
below.
Note that the current tests do not attempt to be exhaustive, eg include a
line for every character with the status for whether it is valid or not.
Such a test can be written using the main data file at
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt.
Other test cases can be added for the future; if you (or others) have
suggestions for good test lines, please let us know.
Mark
*— Il meglio è l’inimico del bene —*
On Thu, Sep 16, 2010 at 14:59, Colosi, John <jcolosi@verisign.com> wrote:
> Hello all,
>
>
>
> I represent the VeriSign Domain Name Registry as an implementer of the
> latest IDNA specifications. The following four (4) questions arose during
> our implementation of the conformance test.
>
>
>
>
>
> *Question **1 of 4***
>
> *Line* 204
>
> *Input* \u0646 \u0627 \u0645 \u0647 \u200C \u0627 \u06CC
>
> *Reference* Appendix A.1 of *RFC 5892 (Tables)<https://trac.tools.ietf.org/html/rfc5892>
> *
>
> *Issue* Per the reference, the ZWNJ (\u200C) must meet one of
> two qualifications. It must be preceded by a character with VIRAMA
> combining class. OR the characters in the label must have a certain pattern
> of joining types. This input does not meet either of these criteria, and
> appears to be an invalid IDN label with respect to the IDNA 2008 standards.
> There are ten (10) such lines in the input file.
>
This is by design. UTS#46 does not have the contextual checks for ZWJ and
ZWNJ.
Background: While those are excellent checks to have, and are recommended,
they only prevent a small fraction of the homoglyph exploits, so they are
not required by UTS#46 and are not tested for in the file. (If you disagree
with that approach, you should bring that up to the UTC for the next version
of UTS#46.) UTS#46 does allow for implementations to be stricter if desired,
so any implementation can apply those IDNA2008 checks.
Note that we could add a field in the test file that indicated whether the
input (or mapped input [see below]) was valid under IDNA2008. Do people
think that would be helpful?
>
>
>
>
> *Question **2 of 4***
>
> *Line* 319
>
> *Input* …
> 1234567890123456789012345678901234567890123456789012345678901234…
>
> *Reference* Sections 3.1 and 3.5 of *RFC 1034<http://www.ietf.org/rfc/rfc1034.txt>
> *
>
> *Issue* Per the reference, DNS labels cannot contain more than
> 63 octets. It appears that this is a purposeful test, since the first label
> is exactly 63 octets, and the second label is 64 octets. This does not
> apply to other applications, but these lines of input are not valid for
> DNS. There are three (3) such lines in the input file.
>
This appears to be a mistake in the conformance file generation. I'll look
at it to see what is happening.
>
>
>
>
> *Question **3 of 4***
>
> *Line* 319
>
> *Input* U \u0308 . xn--tda
>
> *Reference* Section 4.1 of *RFC 5891 (Protocol)<https://trac.tools.ietf.org/html/rfc5891>
> *
>
> *Issue* Per the reference, input into the IDNA Registration
> process “MUST be… in Normalization Form C”. This input does not meet these
> standards. The first label is not properly normalized. Implementations of
> IDNA 2008 for registration should expect an exception. There are four (4)
> such lines in the input file.
>
Here is the situation:
- IDNA2003 allows as input denormalized text; it requires that text be
normalized (and case-folded) in the process of generating the punycode.
- IDNA2008 disallows denormalized text per se; however it allows a
mapping phase for the input, which can do a normalization and case folding
for consistency with IDNA2003.
UTS#46 provides for a mapping that is consistent with IDNA2003 and allowed
by IDNA2008. That mapping normalizes U\u0308 to a lowercase U-umlaut, which
is valid.
>
>
>
> *Question **4 of 4***
>
> *Line* 276
>
> *Input* xn—53h
>
> *Reference* Appendix B.1 of *RFC 5892 (Tables)<https://trac.tools.ietf.org/html/rfc5892>
> *
>
> *Issue* Per the reference, the character \u2615 is disallowed.
>
> 2460..26CD ; DISALLOWED # CIRCLED DIGIT ONE..DISABLED CAR
>
> Implementations should expect an exception. There are twenty (20) such
> lines in the input file.
>
>
>
This is another instance where UTS#46 is mapping. See the line of
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt, which has the
following. Such a mapping is permitted by IDNA2008.
2461 ; mapped ; 0032 # 1.1 CIRCLED DIGIT TWO
>
> Any input is appreciated,
>
> -- John
>
>
>
>
>
> John Colosi | Naming Services | Veri*Sign*, Inc.
> Å 703.948.3211 È 703.967.4062 Ê 703.421.8233
>
> *This message is intended for the use of the individual or entity to
> **which it is addressed, and may contain information that is privileged,
> **confidential and exempt from disclosure under applicable law. Any
> **unauthorized use, distribution, or disclosure is strictly prohibited. If
> **you have received this message in error, please notify sender
> **immediately and destroy/delete the original transmission.
>
> *
>
This archive was generated by hypermail 2.1.5 : Sun Sep 19 2010 - 18:35:38 CDT