On 4 Jan 2017, at 08:12, Martin J. Dürst <duerst_at_it.aoyama.ac.jp> wrote:
>
> Hello Alastair,
>
> On 2016/12/06 20:51, Alastair Houghton wrote:
>> Hi all,
>>
>> I must be missing something; in IdnaTest.txt, in the BIDI TESTS section, there are examples like (line 74)
>
> Can you tell us where you got IdnaTest.txt from?
Yes, sorry, I should have included that information. It’s here, with the IDNA mapping table
http://www.unicode.org/Public/idna/9.0.0/
which I arrived at from UTS #46 (<http://www.unicode.org/reports/tr46>).
>> B; 0à.\u05D0; ; xn--0-sfa.xn--4db # 0à.א
>>
>> which the file alleges are valid, but I cannot for the life of me see why. First, “0à.א” is clearly a “Bidi domain name” since it has at least one RTL label, “א”. As such, the Bidi Rule (RFC 5893 section 2) should be applied to its labels, and the label “0à” fails [B1], since the first character has Bidi property EN, not L, R or AL.
>
> On first sight, it looks to me as if you're correct.
>
> For the exact interpretation of RFC 5893, you'd better write to the mailing list of the former IDNA(bis) WG at idna-update_at_alvestrand.no.
RFC 5893 seems pretty clear to me, and the problem really is that the test vectors (which come from unicode.org) seem (to me) to be incorrect. I think the Unicode list is, therefore, the right place to raise this issue, but you’re right that it might attract attention from the right people if I also fire off a mail to the IDNA WG list.
>> Similarly (line 93)
>>
>> B; àˇ.\u05D0; ; xn--0ca88g.xn--4db # àˇ.א
>>
>> Again, “àˇ.א” is clearly a “Bidi domain name”, but “àˇ” fails [B6], because “ˇ” has Bidi property ON, not L, EN or NSM.
>>
>> Have I misunderstood something fundamental here? Could someone explain why those examples are valid, in spite of RFC 5893?
As an additional data point, ICU’s IDNA demo web page appears to think these names are OK.
Kind regards,
Alastair.
-- http://alastairs-place.netReceived on Wed Jan 04 2017 - 04:29:41 CST
This archive was generated by hypermail 2.2.0 : Wed Jan 04 2017 - 04:29:43 CST