Questions on the Unicode BiDirectional (BIDI) Algorithm
verdy_p at wanadoo.fr
Sun Jul 6 13:10:12 CDT 2014
Neither do I think that the Verisign registry has to perform such mappings,
even if they have implemented an equivalence, they should still return a
domain name using the Arabic digits if it was registered like this, and
should allow resolving such names directly (even if it also resoves the
name with the Arabo-European digits from ASCII).
The Bidi algorithm is completely independant of IDNA which addresses other
issues and in fact is more concerned about canonical and compatibility
equivalences or about confusables (notably with Indic digits, such as an
Indic digit 4 that looks very much like a European digit 8).
For IDNA there's a superset of equivalences or characters prohibited that
goes far beyond basic canonical and compatibiluty equivalences, but Bidi is
not an issue for the registration of IDNA labels in domain names (the
possible issue is in the rendering of a FQDN domain alternating LTR and RTL
labels, because the dot (.) separator has a weak direction in Bidi. But
this is not directly an issue of the domain name system, but about how to
render an URL (or more generally an URI) which should be parsed and have
some characters (notably, the dot and slash) changed to adopt a strong
direction, differnt from the generic Bidi applied directly to the full URI
as if it was using a human language.
There may be issues however with some domain name labels (separated by
dots) that could mix characters allowed in a large set but with different
strong directions. As such thng may break and create lots of confusable
labels (and as Bidi controls are prohibited in domain labels),this could
create havoc. But today's browsers perform some validation of domain labels
to make sure their resolved Bidi dirction cannot change more than once.
There are also issues with the minus-hyphen within labels (allowed only in
the middle without repetition, it also has a weak direction inherited from
the letters/digits encoded before it; but with normal text rendering it
could have its visual position changed and could create confusable domain
Each registry applies its own filters to allow or disallow some characters.
They cannot open the full repertoire, and before extending their allowed
character set they have to make sure that this will not create havoc with
their own existing names (and they need to investigate how major web
browsers will handle these new types of domain names, including in URLs.
The problem is harder to solve in some formats (notably when URLs are just
embeded without any standar syntax identifying them in plain text, e.g. in
plain-text emails, or in short text fields in a database, where rich text
encoding is not allowed, or in records of email addresses, or outside IDNA
with user-selected user account names, including Facebook pages : they
could be used to trick someone to connect to the wrong account or download
2014-07-06 19:31 GMT+02:00 Doug Ewell <doug at ewellic.org>:
> William Blackwood <wblackwo at tampabay dot rr dot com> wrote:
> Can anyone provide me an actively resolving example of a .com domain
>> name that demonstrates employment of the Unicode BIDI algorithm?
>> Specifically, I am looking for realized/resolving examples of an
>> Arabic number (AN) and character-containing domain name, (such as
>> مصر.com <http://xn--wgbh1c.com>), but that which employs the BIDI
>> algorithm to change an
>> Arabic 1, to a European number (EN) 1? (E.g. مص١ر.com
>> <http://xn--wgbh1cxg.com>), or (مص1ر.com <http://xn--wgb.com>).
>> The BIDI algorithm should be changing either the AN or EN, or vise-
>> versa; or has Verisign not yet incorporated the BIDI algorithm into
>> its registry?
> I would never expect application of the Unicode Bidirectional Algorithm to
> change an Arabic digit like ١ into a European digit like 1.
> Doug Ewell | Thornton, CO, USA
> http://ewellic.org | @DougEwell
> Unicode mailing list
> Unicode at unicode.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode