From: Cary Karp (ck@nic.museum)
Date: Tue Feb 15 2005 - 04:26:07 CST
Quoting Mark E. Shoulson:
> I recognize this is opening a can of worms... but then, it was you that
> opened it. I'm looking at the idn-chars.html page, and I have a few
> questions about (naturally) the Hebrew script (since that's one I'm
> familiar with).
I have another question about the IDN implementation of the Hebrew
script. Given that IDN security concerns stand in direct proportion to
the size of the character repertoire in actual use, I trust that it is
relevant (at least initially) to the present topic heading.
The HEBREW PUNCTUATION GERSHAYIM U+05F4 <״> appears in the penultimate
position in a sequence of Hebrew characters that is not to be read as a
word. Since such things as acronyms are regularly used as domain labels,
it thus appears necessary for any registry supporting Hebrew to include
this code point in the corresponding character table. If so, this is a
good example of a situation where "an exception is appropriate" to the
general stricture on "punctuation characters", stated in the ICANN
Guidelines for the Implementation of Internationalized Domain Names.
The problem is that a standard Hebrew keyboard doesn't include this
character, which is normally replaced by a QUOTATION MARK U+0022. Anyone
entering an IDN including U+05F4 via a keyboard will therefore be likely
to mistype it as U+0022, causing it to fail. It is possible to get an
IDN string containing a quotation mark throughToASCII by leaving the
UseSTD3ASCIIRules flag unset (which is counter to a "should" point in
the ICANN Guidelines). The resulting string contains a literal quotation
mark. Since it is this string that is actually included in the zone
file, the name server will need to load what it is likely to reject as a
malformed name regardless of any IDN considerations.
Can someone who has detailed understanding of Hebrew orthography please
comment on the necessity of the gershayim in the context described
above. If it cannot comfortable be done without, how can one offset the
confusion that seems inevitable given the alternate orthography on which
the local keyboard is based? Are there other code points listed as
punctuation in the Unicode charts that are similarly necessary for the
IDN support of established orthographic convention in the languages for
which they are used?
/Cary
This archive was generated by hypermail 2.1.5 : Tue Feb 15 2005 - 04:25:05 CST