From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Oct 29 2010 - 16:36:09 CDT
Nagesh Chigurupati asked:
> I have a question regarding some of the contextual rules in RFC5892. For
> example the contextual rule in appendix A.4 Greek Lower Numeral Sign
> (U+0375), states the following:
>
> If Script(After(cp)) .eq. Greek Then True;
>
> If the Greek Lower Numeral Sign (U+0375) is the last code point in the
> IDN, should it be allowed? There are statements in the RFC5892 as
> follows:
>
> Before(FirstChar) evaluates to Undefined.
> After(LastChar) evaluates to Undefined.
>
> Can I assume that "Undefined" is not equal to "Greek", and therefore
> input sequences with a trailing Greek Lower Numeral Sign are always
> disallowed by the specification?
Correct.
> The Hebrew Punctuation Geresh (U+05F3), Hebrew Puncutation Gershayim
> (U+05F4), etc. also pose a similar question. The rule set for these
> contextual rules states the following:
>
> If Script(Before(cp)) .eq. Hebrew Then True;
>
> So, if the first code point is U+05F3, then should it be disallowed
Correct.
> as
> there is no code point before this one to assert that it belongs to the
> Hebrew script.
Although the reasoning there is incorrect. The script of
U+05F3 and U+05F4 is Hebrew already. It isn't a matter of a lack
of a previous character to assert this. Rather, the RFC 5892
specification simply states that U+05F3 and U+05F4
are only allowed immediately following a(nother) Hebrew character
in a label.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Oct 29 2010 - 16:38:44 CDT