Forwarded
conversationSubject:
Concerns
about the "szett" exception------------------------
From:
Alexander Mayrhofer <alexander.mayrhofer@nic.at>
Date: Mon, Oct 26, 2009 at 05:15
To:
idna-update@alvestrand.no
All,
As you probably know, i'm working
for the Austrian Domain Name Registry
(
nic.at).
I've recently prepared a presentation to our board regarding
the
changes to expect from IDNAbis deployment, and I've been asked by
our
board to voice our concerns about the "szett" (U+00DF) exception in
the current document set. I understand that the documents have
progressed very far, and that we should have voiced our concerns earlier
- however, i think that the information below is still valuable to the
group.
Obviously, the DNS is an extremely important identity and
naming system
that is crucial to the operation of nearly all internet
applications.
Therefore, any changes to that structure are delicate
operations. This
is important for the creation of new portions of
namespace, but
particularly important when the semantics of a
namespace (portion) are
changed. The introduction of IDNA2003 was an
extension of the namespace,
at least from the application perspective
(technically, it was changing
the definition of an awkward-enough
portion of the namespace, namely
labels with "xn--").
Changing
the semantics of a certain namespace is *really bad*, and i
agree to
what Marcos said long time ago "Breaking backwards
compatibility is
to my eyes the big stigma of IDNA2008".
I understand and welcome
the introduction of rigid rules in IDNAbis as
the primary mechanism
to identify copepoint classification and protocol
validity.
Independence from a certain Unicode revision ensures a stable
specification, and should create few "surprises" (essentially, it shifts
responsibility of character classification from the IETF to Unicode). I
also understand and welcome the 1:1 relation on the protocol level
between A-label and U-label.
However, the introduction of
*exceptions* that work around those rigid
rules, and particularly
changing the semantics of a part of a deployed,
used namespace is
*really really bad* - particularly if the exception
concerns such a
"weird" character as the "szett" (Unicode folding-wise).
Such changes
generally have the potential to change the resolved
destination for
a certain domain name, which in turn creates *major*
security issues,
and hurts interopability badly, because unlike the
introduction of
IDN2003, where a label would either work or not, those
exceptions now
create a situation where such a label would resolve to
either
destination A (old application), destination B (new application).
I understand that the Rationale document proposes sensible
approaches in
Section 7.2 - however, i think the security issues
could discuss the
problems more explicitely, rather than just
referring to the rationale
document (which is informational anyways).
I think that the sentence
"...a few characters that were mapped
to others in the earlier
version;
zone administrators should be
aware of the problems that might raise
and take appropriate
measures"
In the definitions document could easily be overlooked
by implementors.
Another issue makes it even harder for zone
administrator to deal with
the problem: Actually *encouraging*
application developers to create
their own fancy mapping definitions,
beyond the mappings that were
included in IDNA2003 allows for even
more "variations", and are bound to
hurt interopability badly. One
example of this is the Unicode TR46,
particularly the proposal of
"dual lookups" and "trusted registries" for
"Deviations", which i
believe to be a really really bad idea - but what
are the other
options?
Shifting the responsibility of mapping, and therefore
allowing for
creating a myriad of mapping options to application
developers seems
risky to me, particularly for the Exception
codepoints for which
protocol definitions have changed between the
two versions. From my
point of view, it makes such codepoints
unusable - the "mapping du jour"
of application X could be entirely
different than that of application Y.
The Mapping draft says that
it's "unusual" for the IETF to disucss user
input processing steps -
but on the other hand, Section 2.1 of RFC 3761
(the ENUM base
specification) clearly provides normative text about how
user input
should be prepared for a protocol (and i'm sure there are
many other
examples). So it seems the IETF *is* concerned about how user
input
is mapped to protocol elements.
To sum up, we would have
preferred the "szett" (U+00DF) to be kept
"DISALLOWED", and to have
the IETF describe the mapping procedures not
just "Informational"
(The contents of the mapping document itself is
perfectly fine). We
also hope that the IETF liases with application
developers,
particularly browser vendors, to establish one single "de
facto"
mapping procedure, so that at least the szett does not become a
moving target.
Thanks,
Alex Mayrhofer
_______________________________________________
Idna-update mailing
list
Idna-update@alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
----------
From:
Mark Davis ☕ <mark@macchiato.com>
Date: Mon, Oct 26, 2009 at 07:15
To: Alexander Mayrhofer <alexander.mayrhofer@nic.at>
Cc: idna-update@alvestrand.no,
Marcos Sanz/Denic <sanz@denic.de>
The Unicode Consortium shares your concerns about
the treatment of deviations, and the security and interoperability
issues resulting from that and custom mappings. Unfortunately, while
those points were raised consistently during the development of IDNA2008
(some would say too persistently), the working group decided on its
current course.
We have been consulting with browser and search
engine vendors (many of whom are members of the consortium), and I would
anticipate that most will not end up implementing IDNA2008 lookup as is
because of the problems it has. TR46 is designed by those needing to
implement IDNA lookup so as to provide a bridge specification, whereby
implementations can maximize compatibility with IDNA2003 and IDNA2008 on
the lookup side, and avoid these problems. On the "dual lookup" and
"trusted registries" point that you mention: the text of TR46 is
insufficiently clear. That section is discussing alternative approaches
that were considered, but discarded (because they don't work well, as
Marcos pointed out in detail). I'll make sure that that feedback is
brought into the committee.
TR46 is not really aimed at the
registry side. It is feasible for registries to implement IDNA2008 if
they additionally DISALLOW the four deviations (including es-zett). This
can be done while being conformant to IDNA2008, because registries can
further limit the characters they support.
Mark