RE: TC/SC mapping

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu Jan 24 2002 - 14:04:58 EST


Doug Ewell wrote:
> Currently on the IDN mailing list there is a big debate over
> this topic. It is well known that ASCII-based domain names
> are matched in the DNS in a case-insensitive manner. Many
> people recognize that Chinese readers who are familiar with
> both TC and SC consider text written in the two sub-scripts
> to be interchangeable, in roughly the same way that
> uppercase and lowercase Latin are interchangeable.

Converting TC to SC is difficult, and the opposite is nearly impossible. But
a simple "loose match" like the one you describe does not seem so difficult.

On the other hand, out of English-only realm, also converting uppercase to
lowercase is difficult, and the opposite is nearly impossible. But simple
case folding is not so difficult.

Here it is simply a matter of putting together all the groups of ideographs
that may be considered variants of each other (not only SC and TC, but also
Japanese simplifications, semantic variants, "specialized semantic
variants", compatibility equivalents, radicals, etc.), and to map them
*internally* to a single key (e.g., the lowest code point in the group).

You don't even bother whether the result is TC, SC, or a horrible mix of the
two: anyway, nobody is supposed to see it.

Of course there are security concerns. It the conversion must be
well-defined and not be changed in the course of time. And, of course, DNS's
should be registered in their "folded" version.

_ Marco



This archive was generated by hypermail 2.1.2 : Thu Jan 24 2002 - 13:32:19 EST