From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Apr 01 2005 - 19:13:56 CST
There is a revised version of the IDN characters at:
http://unicode.org/reports/tr36/draft/idn-chars.html
with now a plaintext version at
http://unicode.org/reports/tr36/draft/idn-chars.txt
for those who prefer a list by name.
There are different approaches being discussed for international domain
names. Under one approach, whole classes of characters may be removed. If
that approach is taken, we want to be sure not to remove characters that are
required for words in modern languages.
If you search for "WORD CHARACTERS ADDED" in the plaintext list, you'll find
near the end a draft set of characters marked with "word-chars". This is a
set of characters that should be added to the set of characters allowed in
identifiers, in situations where words in modern languages could not be
constructed without them.
The word-chars include characters that are listed in TR29 as possibly
belonging to words, plus U+04C0 (Ӏ) CYRILLIC LETTER PALOCHKA (supplied on
this thread), plus Sk characters with MODIFIER LETTER in their names (that
latter list is somewhat questionable; if anyone has information as to which
are only used for technical purposes, such as IPA, please let me know).
Comments are welcome as to any of these that should be removed, as not being
necessary for modern languages.
Below that in the plaintext file are the "FOR REVIEW" characters. I ask that
people also review that list, and respond back with any characters that are
needed for words in modern languages, that is, that should be moved up into
the "WORD CHARACTERS ADDED" list. Unless we hear otherwise, these may end up
being excluded from domain names.
Mark
This archive was generated by hypermail 2.1.5 : Fri Apr 01 2005 - 19:15:05 CST