Re: BIDI IRI Display (was spoofing and IRIs)

From: Martin J. Dürst (duerst@it.aoyama.ac.jp)
Date: Thu Mar 04 2010 - 03:43:44 CST

  • Next message: Michael Kaplan: "RE: [unicode] Problems with Windows 7 Unicode Font Rendering"

    Hello Jonny,

    On 2010/03/04 17:13, Jonathan Rosenne wrote:
    > There is no average BIDI user to observe, since there are no BIDI TLDs and
    > no BIDI equivalents to http, ftp etc.
    >
    > In my way of thinking, and average BIDI user does not normally mix LTR and
    > RTL, programmers excepted.

    Can you expand on this a bit more? E.g. how much do LTR
    words/phrases/sentences/whatever appear in average RTL (e.g. Hebrew or
    Arabic) text? How much in newspapers? How much in books? How much in Web
    pages? How much in informative text vs. advertisements,...?

    Regards, Martin.

    > Jony
    >
    >> -----Original Message-----
    >> From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On
    >> Behalf Of Shawn Steele
    >> Sent: Thursday, March 04, 2010 7:56 AM
    >> To: Larry Masinter; 'Slim Amamou'
    >> Cc: public-iri@w3.org; Peter Constable; unicode@unicode.org
    >> Subject: RE: BIDI IRI Display (was spoofing and IRIs)
    >>
    >> The problem isn't an IRI in different contexts (a list of IRIs or not),
    >> the problem is that an IRI *IS* a list.
    >>
    >> http://www.microsoft.com/en/us/default.aspx is a lot like { www,
    >> microsoft, com, en, us, default.aspx }, so IRI's shouldn't mix up the
    >> parts, (eg: reversing en& us in the display would be misleading). In
    >> a BIDI context, this probably means that the elements of the list are
    >> ordered from right to left. The problem with the Unicode bidi
    >> algorithm is that if 2 LTR script elements are adjacent, they lose the
    >> ordering of the list.
    >>
    >> Users seem to expect that elements of an IRI are drawn as a list like I
    >> described. It has also been proposed that they just be rendered from
    >> LTR regardless of whether any labels are RTL or not, and another
    >> suggestion has been that users don't really understand the ordering of
    >> the IRI, so it's okay to reorder as long as it's consistent.
    >>
    >> I would like to see a usability study to figure out what the average
    >> BIDI user expects since us engineers may have biases that most people
    >> don't have. My informal observations and feedback from the BIDI
    >> community seems to support the "elements of a list" hypothesis, however
    >> I'd like that to be confirmed (or disproved) by a "real" usability
    >> study :)
    >>
    >> -Shawn
    >>
    >> ________________________________________
    >> From: Larry Masinter [masinter@gmail.com] on behalf of Larry Masinter
    >> [LMM@acm.org]
    >> Sent: Wednesday, March 03, 2010 6:00 PM
    >> To: Shawn Steele; 'Slim Amamou'
    >> Cc: public-iri@w3.org; Peter Constable; unicode@unicode.org
    >> Subject: RE: BIDI IRI Display (was spoofing and IRIs)
    >>
    >> If the same Unicode string is used for an IRI in running text and for
    >> an IRI in a context where its use as a "ordered list", then it would
    >> seem like
    >>
    >> * the presentation of the IRI in different contexts is the same
    >>
    >> is more important than
    >>
    >> * the presentation of the IRI in known IRI contexts is optimal
    >>
    >> Do you agree? I don't see how you can have both.
    >>
    >> Larry
    >> --
    >> http://larry.masinter.net
    >>
    >>
    >> -----Original Message-----
    >> From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
    >> Sent: Wednesday, March 03, 2010 9:13 AM
    >> To: Slim Amamou; Larry Masinter
    >> Cc: public-iri@w3.org; Peter Constable; (unicode@unicode.org)
    >> Subject: RE: BIDI IRI Display (was spoofing and IRIs)
    >>
    >>> An IRI is a sequence of Unicode characters. Is there not
    >>> already a well-defined way of converting a sequence of
    >>> Unicode characters to a visual display?
    >>
    >> The problem (from my perspective at least) is that the Unicode BIDI
    >> rules are somewhat "generic". Unicode expects things like / and . to
    >> be used in a context of same-script stuff, like a date, time or
    >> number. IRIs use them as delimiters for a list of elements (labels in
    >> the domain name or folders in the path), in a hierarchical form. The
    >> Unicode BIDI algorithm doesn't recognize that there's an underlying
    >> hierarchy, so it can end up "swapping" pieces in that hierarchy in
    >> some cases.
    >>
    >> I'm not sure UTR#36 is the proper place to clarify display of such
    >> ordered lists. Proper BIDI rendering of IRIs isn't just a security,
    >> but also a usability, problem. It does seem like perhaps this concept
    >> should be mentioned in Unicode somewhere. (IRIs aren't the only place
    >> that similar ordered lists happen).
    >>
    >> -Shawn
    >
    >
    >

    -- 
    #-# Martin J. Dürst, Professor, Aoyama Gakuin University
    #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
    


    This archive was generated by hypermail 2.1.5 : Thu Mar 04 2010 - 03:47:37 CST