From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Dec 17 2007 - 19:58:31 CST
There may be some misunderstanding. Unicode does define the default
direction of a paragraph for use with the bidi algorithm (which determines
the ordering of characters containing bidirectional scripts like Arabic or
Hebrew).
See http://unicode.org/reports/tr9/
Mark
On Dec 17, 2007 4:23 PM, Behnam <behnam.rassi@gmail.com> wrote:
> Thank you.So the answer is no. Unicode does not define the directionality
> of a paragraph. Then I guess my next question should be why?
> I think I have some explaining to do.
> Unicode defines a very complex bidi behaviour of characters, and it
> defines the beginning and ending of a paragraph (I assume). Yet, it doesn't
> define what directionality this paragraph should take to arrange these
> characters within the paragraph.
> Defining the directionality of a paragraph is more important than defining
> the language of a text. Yes, language tag can help language aware devices
> and applications behave accordingly. But directionality definition is not
> about ' user friendly' behaviour of a text, it is about reproducing the raw
> text, as intended by its Unicode encoding.
> Understanding this issue I suppose, may be very easy or very difficult,
> depending on to the extend you were exposed to rtl experience. In the next
> paragraph, I write a Persian line, throwing a couple of English words
> within, and in left to right directionality to give you an idea about what
> right to left users are experiencing in everyday basis.
> پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین نکرده
> است.
> In order to read the above phrase correctly in Persian, the order of words
> should be as I numbered below (from right to left):
> پرسش1 من2 از3 Unicode4 این5 است6 که7 چرا8 برای9 پاراگراف10
> directionality11 تبیین12 نکرده13 است14.
>
> Of-course I can set this paragraph in my application to "rtl" and thanks
> to wonders of bidi behaviour of characters, everything will be put in place:
>
> پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین نکرده
> است.
>
> But I have absolutely no guarantee that my rtl text in an email, in a text
> message, in an online forum posting... will be received in rtl setting. This
> perfectly Unicode encoded text is at the mercy of applications, devices,
> mediums and platforms. And more likely than not, my rtl paragraph will be
> received in ltr and in the order that I numbered above! Even in a more
> controlled situations such as word processors, as a friend of mine has
> experienced, this Persian phrase written in rtl setting of Nisus on a Mac,
> exported in a .doc format, and opened on a Windows platform will produce an
> rtl, but 'Arabic' document! not only an Arabic script document which is, but
> an Arabic language document!
>
> You can experiment this dilemma yourself. Set your application to rtl
> (which can be done in many applications), write something in English or any
> Roman language. As long as the whole phrase is Roman, you only get a
> misplaced final period in far left. But if you throw a couple of Hebrew
> words within the phrase, then you'll see what a wrong directionality setting
> can do to your English. Of-course you are not exposed to this dilemma
> because the default directionality of all computerized devices and
> applications is left to right. But it gives you an idea what rtl users are
> going through in everyday basis.
>
> Again, this is not about requesting a convenience. It is about requesting
> Unicode to do what it is set to do. Unicode encodes bidi behaviour of
> characters, the beginning of a paragraph, the end of a paragraph. It must
> encode its directionality too.
>
> Behnam
>
>
> On 17-Dec-07, at 4:20 AM, Stephane Bortzmeyer wrote:
>
> On Sat, Dec 15, 2007 at 11:08:40AM -0500,
> Behnam <behnam.rassi@gmail.com> wrote
> a message of 78 lines which said:
>
> Is there any Unicode standard to identify a text? i.e. primary
> script>directionality>language?
>
>
> Not an Unicode standard but, yes, there is a standard to tag texts to
> indicate language, script, etc. It's RFC 4646. See
> http://www.langtag.net/ for a start.
>
>
>
-- Mark
This archive was generated by hypermail 2.1.5 : Mon Dec 17 2007 - 20:00:43 CST