From: Erik van der Poel (erik@vanderpoel.org)
Date: Sun Mar 06 2005 - 23:29:44 CST
I'm sorry. Maybe I just confused people by bringing HTML into the
discussion. So let me talk about Nameprep itself. In the name of
typeability in Nameprep, compatibility characters are normalized via
Unicode's Normalization Form KC (NFKC). The example that everyone always
seems to use is the set of "wide" characters used in Japan, etc. The
claim is that those wide characters are very easy to type in Japanese
input methods, and that it would be nice if Nameprep automatically
mapped those characters to the "real" characters, i.e. the normal width
versions. E.g. wide 'a' becomes regular 'a'.
Now, instead of adopting bits and pieces of Unicode and NFKC, Nameprep
decided to keep things simple, and adopted the whole process. The result
was that a number of not-so-easily typed characters, such as
double-struck C, also got included. There is no good reason to map
double-struck C to regular 'c' because the regular 'c' is far easier to
type at the keyboard, unlike the Japanese wide character case.
So, all I'm saying is that by adopting basically all of Unicode 3.2 and
the whole NFKC process for those characters (followed by some
prohibitions after those steps), Nameprep ended up allowing such
inappropriate characters as double-struck C to be fed into the mapping
process. I believe this was unnecessary. Nameprep could instead have
chosen to return an error upon encountering double-struck C before
normalization.
Erik
This archive was generated by hypermail 2.1.5 : Sun Mar 06 2005 - 23:31:04 CST