Hello Florian,
Of course, KC/KD-normalization is not sufficient. The problem
already exists in ASCII. I/l/1 and 0/O can easily be confused.
It will always be necessary for people to think a bit when creating
their email addresses,...
On the other hand, when identifiers can be written in various
scripts, this will help avoid spelling and transcription errors
by people who are not familiar with the Latin script and the
various transliteration conventions.
Overall, there is a kind of 'natural selection'. The creators
of identifiers will find out one way or another what identifiers
work and what don't.
Of course, normalization (preferably NFC and/or NFKC, to stay in
line with the W3C and the IETF) can help quite a bit.
NFC only eliminates things that are supposed to look exactly
the same. NFKC eliminates quite a bit more than that.
Regards, Martin.
At 20:10 01/04/15 +0200, Florian Weimer wrote:
>Unicode is finally entering domains which were ASCII-only for decades.
>However, with some kinds of identifiers, new problems occur. Such
>identifiers are interpreted by humans and machines, and they have to
>survive printing and reentering. Furthermore, it might not be
>possible to check identifiers online (in contrast to programming
>language identifiers). Think of local-parts of email addresses for an
>example.
>
>Is it sufficient to mandate that all such identifiers MUST be KC- or
>KD-normalized? Does this guarantee print-and-enter round-trip
>compatibility?
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT