From: Mark E. Shoulson (mark@kli.org)
Date: Mon Feb 14 2005 - 18:55:31 CST
Mark Davis wrote:
>3. The UTR had for some time recommended the development of data on visually
>confusables, and we will be starting to collect data to test the feasibility
>of different approaches. In regards to that, I'll call people's attention to
>the chart on http://www.unicode.org/reports/tr36/idn-chars.html, that shows
>the permissible IDN characters, ordered by script, then whether decomposable
>or not, then according to UCA collation order. (These are characters after
>StringPrep has been performed, so case-folding and normalization have
>already been applied.)
>
>
I recognize this is opening a can of worms... but then, it was you that
opened it. I'm looking at the idn-chars.html page, and I have a few
questions about (naturally) the Hebrew script (since that's one I'm
familiar with).
Why are the YOD-YOD and VAV-YOD and DOUBLE-VAV digraphs considered
atomic? Typographically they're often realized as two separate
letters, even in Yiddish. On the other hand, the ALEF-LAMED ligature is
more likely to deserve consideration as an atomic character (but not
enough that I'd actually argue for it), and yet it's missing. What gives?
Having all the vowels and accents(!) available, in Hebrew and in Arabic
as well, is almost certainly overkill (I can't imagine anyone would want
to complicate a URL so much), but I suppose it's okay for completeness'
sake.
(Braille is an interesting case, since by rights people using Braille
readers would be registering names in the appropriate scripts, and
merely representing them with Braille patterns, but again, I suppose
it's harmless—I can't see anyone actually wanting to use it)
The dingbats, obviously, are going to be an interesting battleground of
domain buyers...
~mark
This archive was generated by hypermail 2.1.5 : Mon Feb 14 2005 - 18:56:12 CST