Re: RTL PUA? from Doug Ewell on 2011-08-21 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Sun, 21 Aug 2011 11:00:26 -0600

I think as soon as we start talking about this many scenarios, we are no
longer talking about what the *default* bidi class of the PUA (or some
part of it) should be. Instead, we are talking about being able to
specify private customizations, so that one can have 'AL' runs and 'ON'
runs and so forth.

There really isn't any way the UTC is going to approve changing one part
of the PUA to be default 'AL', another part 'R', another part 'ON', etc.
Asmus just said that merely assigning one plane to be different from the
others "should be a non-starter."

For this discussion, I really don't find it very interesting that
existing technologies A, B, and C don't currently provide a way to
override the default PUA properties. Through most of the 1990s, most
existing applications and technologies didn't support Unicode at all, or
very small parts of it, and the solution generally was to update them so
that they would. The same should be true here. I would suggest that
installing a modified copy of UnicodeData.txt seems like a rather clumsy
solution; if text files are involved, I'd suggest leaving
UnicodeData.txt alone and creating some sort of "overrides" file.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell 
-----Original Message----- 
From: Richard Wordingham
Sent: Sunday, August 21, 2011 9:48
To: unicode_at_unicode.org
Subject: Re: RTL PUA?
On Sun, 21 Aug 2011 01:44:02 +0000
"Doug Ewell" <doug_at_ewellic.org> wrote:
>> The more I think of it, the more I like the idea of reassigning the
>> default BC of Plane 16 to 'R'. What would the arguments against this
>> be?
>> BC of 'AL'?
> Would that really be a better default? I thought the main RTL needs
> for the PUA would be for unencoded scripts, not for even more Arabic
> letters. (How many more are there anyway?)
Not necessarily better, I'm just suggesting that both need to be
supported.  However, we need to look at use cases.
(1) Unencoded Arabic script letters with joining behaviour, for use with
any application.
(a) We need the character to have AL, R or ON for it to be included in
BiDi runs.  If we use ON we may need RLM when the character is at the
edge of a run, and even then, its behaviour may be no better than a
character with a BC of R.
(b) It may get left out of script runs.  There were problems on
Windows with the Tamil ligature k.SS not rendering, despite font
support, when the character U+0BB7 TAMIL LETTER SSA was new.  And
that's in a left-to right script with a character in the appropriate
block!
(2) Complete right-to-left script.  I'm presuming the difference
between AL and R is then a matter of what right-to-left script the
potential users chiefly also use.
(a) As a practical implementation, the distinction between AL and R
would matter if the script has modern use.  Otherwise, any of ON, AL
and R would do, though one might face the annoyance of having to start
chunks of text with RLM.  If a script with modern use should be encoded
using a BC of R, then I believe ON would also do as a stop-gap until
the script is encoded.
How fiendish is BiDi-sensitive transliteration?
(b) For experimentation, I believe the difference between AL, R and ON
would matter little, even though it would be irritiating to have to
use RLM.
(c) Complex script support is patchy - one might be restricted to
applications that allow the font to provide full complex script support.
The big issue in all this, though, is (i) how to update the rendering
system with a new set of values for Unicode properties, including
script, and (ii) the scope of such an update.  (The distinction between
the PUA and the rest is that it makes sense for PUA properties to
change as freely as fonts.) This, incidentally, is analogous to locales
reflecting code page selections.  There is also, though less pressing,
the issue of tailoring collations.  (The worst issue is there is
distinct canonically inequivalent characters of type Lo comparing equal
- I've seen it for Canadian Aboriginal Syllabics for Windows XP and for
Thai in Ubuntu 10.04 - surely that's not the normal British collation
of such characters.)
One minor problem with (i) *was* that it wasn't clear how one should
annotate a copy of UnicodeData.txt to show that it has been modified.
The standard XML alternative provides allows for comments, thereby
solving that problem.
If Issue (i) can be readily solved at the machine or user level or
lower, then the default properties of the PUA become irrelevant.
Richard.

Received on Sun Aug 21 2011 - 12:02:43 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 21 2011 - 12:02:44 CDT