Re: Bidi: inserting Japanese paragraphs in Arabic/Farsi document from Philippe Verdy on 2016-11-20 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sun, 20 Nov 2016 21:19:40 +0100

Note that if you get :

OWT-CIBARA "Japanese2 【Japanese1】" ENO-CIBARA

this means that the first quotation mark is "transparent" and preserves the
RTL direction.
And I don't see then how you can pair the final quotation mark, unless you
consider it as "leading" the ARABIC-TWO part (meaning that you don't pair
these quotation marks at all: only brackets are paired and the
fragment【Japanese1】
is correct (you are using the new Bidi algorithm).

There's still ambiguities for handling pairs of quotation marks (this is
not evident at all and it is language-dependant when some languages do not
distinguish the glyph for the leading and trailing marks, or swap them, for
example with »Deutsch« as opposed to «Italiano» or « français», and it is a
difdicult problem in multilingual documents not only mixing RTL and LTR
scripts and needing the Bidi algorithm, and different LTR languages are
occuring).

For citation of Japanese in Arabic text, I sould suggest using Asian
quotation marks by encoding:

ARABIC-ONE 「【Japanese1】 Japanese2」 ARABIC-TWO

so that Asian quotation marks will unambiguously pair together and you'll
get:

OWT-CIBARA 「Japanese2 【Japanese1】」 ENO-CIBARA

Or because 「」, like also 【】, are unambiguously LTR giving them a strong LTR
direction, you'd then get the best:

OWT-CIBARA 「【Japanese1】 Japanese2」 ENO-CIBARA

But If there are line-wraps in the middle of the Japanese section:

「【Japanese1】 ENO-CIBARA
OWT-CIBARA Japanese2」

notably if you can't mirror the CJK quotation marks

Otherwise if you can mirror these marks :

【Japanese1】┐ ENO-CIBARA
OWT-CIBARA └Japanese2

or without any line-break in the middle of the Japanese quotation :

OWT-CIBARA └Japanese2【Japanese1】┐ ENO-CIBARA

(here I use└ ┐ only as aliases for the mirrored「」, which are not encoded)

2016-11-20 20:58 GMT+01:00 Philippe Verdy <verdy_p_at_wanadoo.fr>:

>
>
> 2016-11-20 19:19 GMT+01:00 Eli Zaretskii <eliz_at_gnu.org>:
>
>> > From: Philippe Verdy <verdy_p_at_wanadoo.fr>
>> > Date: Sun, 20 Nov 2016 18:51:01 +0100
>> > Cc: Simon Cozens <simon_at_simon-cozens.org>,
>> > unicode Unicode Discussion <unicode_at_unicode.org>
>> >
>> > Correction: I expect to see:
>> >
>> > OWT-CIBARA Japanese2" 【Japanese1】" ENO-CIBARA
>>
>> I don't understand why.
>>
>> What do you expect with the brackets removed? I expect this:
>>
>> OWT-CIBARA "Japanese1 Japanese2" ENO-CIBARA
>>
>> because N0 and N1 are no-ops, and N2 clearly says that a neutral
>> character that is surrounded by text of different directionalities
>> takes the embedding direction.
>>
>
> With ASCII quotes that are hard to match unambiguously in pairs, they
> would normally inherit what is in their prior context if they cannot be
> paired.
> So the first quotation mark would take the RTL direction of ARABIC-ONE.
> the second quotation mark would also inherit the LTR direction of
> "Japanese2" and would to its right.
>
> The final effect would be that quotes would appear glued side-by-side. But
> note that the two japanese backets are matching together, so no quotation
> mark can be between them: the whole bracketed section including brackets
> should be creating its own isolate: this occurs only with the old Bidi
> algorithm that did not take bracket pairs into account.
>
> So the [Japanese1] bracketed section should be OK with new renderers (this
> is not the case with Chrome that still uses the old algorithm), just after
> the ARABIC-ONE and the leading quotation mark of the Japanese section.
>
> But probably the correct rendering should rather be:
>
> OWT-CIBARA 【Japanese1】 Japanese2"" ENO-CIBARA
>
> unless ASCII quotation marks are paired, in which case you'll get:
>
> OWT-CIBARA "【Japanese1】 Japanese2" ENO-CIBARA
>
> which is most probably what is expected.
>
> All this is about deciding if a quotation mark is "leading" or "trailing",
> and this is not clear at all for ASCII quotation marks and it has a
> consequence on the final rendering made by the Bidi algorithm
>
>
Received on Sun Nov 20 2016 - 14:20:28 CST

This archive was generated by hypermail 2.2.0 : Sun Nov 20 2016 - 14:20:28 CST