RE: New Corrigendum to The Unicode Standard

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Aug 16 2007 - 23:33:38 CDT

Next message: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"

Previous message: Rick McGowan: "New Corrigendum to The Unicode Standard"
In reply to: Rick McGowan: "New Corrigendum to The Unicode Standard"
Next in thread: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"
Reply: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This corrigendum is quite troubling; in a BiDi context, this means that
initial quotation marks will not be mirrored.

Anyway, the classification of quotation marks as initial or final is
problematic because it is not consistant with actual uses in various
languages that use reversed conventions, even in the same LTR directional
context and only in the Latin script.

So the distinction between "Pi" and "Pe" general categories should remain
informative only for the "most common" usage. These punctuations should not
be mirrored simply betcause they can't be accurately distinguished asinitial
or final. So the exact form (orientation and baseline/exponent position) of
these quotation marks should not be altered even in a BiDi context, and it's
up to the writer to choose the proper one for each context.

But how can you manage the correct reordering of these characters if yoy use
them to surround for example a latin quotation within an Arabic text? The
initial quotation will need to inherit the directional property from the
previous Arabic text, and the final quotation will need to inherit the
directional property of the previous Latin text, and there's no way to
determine automatically that it should attach here to the Arabic text after
it, simply because there's no way to determine if the quotations are initial
or final.

This is a difficult problem for which there's no clear indication about what
can be done exactly on this case where quotation marks are inserted exactly
at the positions where a change of script direction occurs. So how to handle
this "smartly"?

=> A good solution will be to consider once again their "Pi"/"Pe" default
distrinction in the general category. And in that case, it gives good hints
about what the quotations marks are marking. So if you know that a quotation
mark is initial or final, then you know that an initial quotation mark after
an Arabic text should not be mirrored given that it will be reordered
according to the direction of the text after it, and that the finalquotation
mark will not need to be mirrored as it will be reordered according to the
latin text before it.

The Caveat is that an Arabic text will not be able to quote a Latin-written
citation »like this« or even ”like that“ even if the quoted language uses
this convention (reversed from the default Pi/Pe distinction), but only
«like this» or even “like that”.

Another difficulty : the quotation marks may be followed by (non-breaking)
spaces (this is even mandatory for double angle quotation marks if you use
French typography, and depending on tricky typographic differences this may
be a NBSP or NNBSP); this is not a major difficulty for the final quotation
marks, but will add some difficulty for the initial (Pi) quotation mark in a
BiDi context where the embedded quotation needs to be reordered.

As a consequence, an Arabic text will not be able to use accurately any
(non-breaking) space with the quotation marks to embed for example a French
quotation, and so will not accurately cite it using the usual « French »
quotation style, unless he drops the non-breaking spaces for «French» or
uses the English quotations to embed the “French” citation.

Before the corrigendum in Unicode 5, the Arabic text would have needed to
embed an Arabic quotation like “Arabic”, but due to the mirrored property,
it would have been read with mirrored quotation marks. So an author could
have decided to swap his quotation signs into ”Arabic“ (so the initial
quotation mark would have the default Pe=ending property, and the final
quotation would have the default Pi=initial property) and if he used them as
well to cite Latin quotations ”like this“, then the BiDi reordering would
still give the expected result because the quotation marks would be attached
to the surrounding Arabic text where they are mirrored and not reordered,
but not to the inner reordered Latin text which is not mirrored. And after
reordering, everybody would see the quoted text as if it was “Latin” with
the quotations reordered with the quoted Latin text.

After the change, given that the quotation marks are no longer mirrored, the
Latin quotation will seem to be now swapped if the text was created for
Unicode 5 without the corrigendum (incorrect orientation) in all cases (in
an Arabic text, they will look like:

.snoitatouq ”cibarA“ dna ”Latin“ erofeb txet emoS

This will be the reading of the text rendered by a post-corrigendum renderer
from the text encoded in this order:

Some text before ”Latin“ and ”Arabic“ quotations.

I suppose then that the intent of the corrigendum is to make sure that the
quotation marks are not mirrored, given that they were not mirrored in
Unicode 4 and before. So the texts are expected to be encoded in this
logical order (BiDi reordering and mirroring disabled):

Some text before “Latin” and “Arabic” quotations.

so that it will be rendered like this in renderers based on Unicode 4 or
post-corrigendum Unicode 5:

.snoitatouq “cibarA” dna “Latin” erofeb txet emoS

but like this if a renderer was built using the pre-corrigendum Unicode 5
properties :

.snoitatouq ”cibarA“ dna ”Latin“ erofeb txet emoS

There may exist other difficulties for the special case of quotation marks
used at the beginning of each paragraph continuing a long quotation (not
closed in the previous paragraph) but this will not affect Arabic documents
making long Latin quotations, but will possibly affect Latin texts including
long Arabic quotations. I think that no authors will try to use this
Latin-specific style for long Arabic quotations.

(final note: in all I wrote above, replace Arabic by any other RTL script,
and Latin by any other LTR script)

> -----Message d'origine-----
> De : cldr-users-bounce@unicode.org [mailto:cldr-users-bounce@unicode.org]
> De la part de Rick McGowan
> Envoyé : vendredi 17 août 2007 04:42
> À : unicode@unicode.org
> Objet : New Corrigendum to The Unicode Standard
>
> The Unicode Consortium has issued a new Corrigendum to The Unicode
> Standard Version 5.0.0. For details on this corrigendum, see:
>
> http://www.unicode.org/versions/corrigendum6.html
>
> For general information on corrigenda to The Unicode Standard, see:
>
> http://www.unicode.org/versions/corrigenda.html
>
> In brief, this corrigendum corrects the Bidi_Mirrored property for several
> characters.
>
>
> Regards,
> Rick McGowan
> Unicode, Inc.
>
>
>

Next message: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"
Previous message: Rick McGowan: "New Corrigendum to The Unicode Standard"
In reply to: Rick McGowan: "New Corrigendum to The Unicode Standard"
Next in thread: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"
Reply: Asmus Freytag: "Re: New Corrigendum to The Unicode Standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Aug 16 2007 - 23:36:23 CDT