Re: Directionality Standard

From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Dec 17 2007 - 19:58:31 CST

  • Next message: Behnam: "Re: Directionality Standard"

    There may be some misunderstanding. Unicode does define the default
    direction of a paragraph for use with the bidi algorithm (which determines
    the ordering of characters containing bidirectional scripts like Arabic or
    Hebrew).

    See http://unicode.org/reports/tr9/

    Mark

    On Dec 17, 2007 4:23 PM, Behnam <behnam.rassi@gmail.com> wrote:

    > Thank you.So the answer is no. Unicode does not define the directionality
    > of a paragraph. Then I guess my next question should be why?
    > I think I have some explaining to do.
    > Unicode defines a very complex bidi behaviour of characters, and it
    > defines the beginning and ending of a paragraph (I assume). Yet, it doesn't
    > define what directionality this paragraph should take to arrange these
    > characters within the paragraph.
    > Defining the directionality of a paragraph is more important than defining
    > the language of a text. Yes, language tag can help language aware devices
    > and applications behave accordingly. But directionality definition is not
    > about ' user friendly' behaviour of a text, it is about reproducing the raw
    > text, as intended by its Unicode encoding.
    > Understanding this issue I suppose, may be very easy or very difficult,
    > depending on to the extend you were exposed to rtl experience. In the next
    > paragraph, I write a Persian line, throwing a couple of English words
    > within, and in left to right directionality to give you an idea about what
    > right to left users are experiencing in everyday basis.
    > پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین نکرده
    > است.
    > In order to read the above phrase correctly in Persian, the order of words
    > should be as I numbered below (from right to left):
    > پرسش1 من2 از3 Unicode4 این5 است6 که7 چرا8 برای9 پاراگراف10
    > directionality11 تبیین12 نکرده13 است14.
    >
    > Of-course I can set this paragraph in my application to "rtl" and thanks
    > to wonders of bidi behaviour of characters, everything will be put in place:
    >
    > پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین نکرده
    > است.
    >
    > But I have absolutely no guarantee that my rtl text in an email, in a text
    > message, in an online forum posting... will be received in rtl setting. This
    > perfectly Unicode encoded text is at the mercy of applications, devices,
    > mediums and platforms. And more likely than not, my rtl paragraph will be
    > received in ltr and in the order that I numbered above! Even in a more
    > controlled situations such as word processors, as a friend of mine has
    > experienced, this Persian phrase written in rtl setting of Nisus on a Mac,
    > exported in a .doc format, and opened on a Windows platform will produce an
    > rtl, but 'Arabic' document! not only an Arabic script document which is, but
    > an Arabic language document!
    >
    > You can experiment this dilemma yourself. Set your application to rtl
    > (which can be done in many applications), write something in English or any
    > Roman language. As long as the whole phrase is Roman, you only get a
    > misplaced final period in far left. But if you throw a couple of Hebrew
    > words within the phrase, then you'll see what a wrong directionality setting
    > can do to your English. Of-course you are not exposed to this dilemma
    > because the default directionality of all computerized devices and
    > applications is left to right. But it gives you an idea what rtl users are
    > going through in everyday basis.
    >
    > Again, this is not about requesting a convenience. It is about requesting
    > Unicode to do what it is set to do. Unicode encodes bidi behaviour of
    > characters, the beginning of a paragraph, the end of a paragraph. It must
    > encode its directionality too.
    >
    > Behnam
    >
    >
    > On 17-Dec-07, at 4:20 AM, Stephane Bortzmeyer wrote:
    >
    > On Sat, Dec 15, 2007 at 11:08:40AM -0500,
    > Behnam <behnam.rassi@gmail.com> wrote
    > a message of 78 lines which said:
    >
    > Is there any Unicode standard to identify a text? i.e. primary
    > script>directionality>language?
    >
    >
    > Not an Unicode standard but, yes, there is a standard to tag texts to
    > indicate language, script, etc. It's RFC 4646. See
    > http://www.langtag.net/ for a start.
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Mon Dec 17 2007 - 20:00:43 CST