I think as soon as we start talking about this many scenarios, we are no
longer talking about what the *default* bidi class of the PUA (or some
part of it) should be. Instead, we are talking about being able to
specify private customizations, so that one can have 'AL' runs and 'ON'
runs and so forth.
There really isn't any way the UTC is going to approve changing one part
of the PUA to be default 'AL', another part 'R', another part 'ON', etc.
Asmus just said that merely assigning one plane to be different from the
others "should be a non-starter."
For this discussion, I really don't find it very interesting that
existing technologies A, B, and C don't currently provide a way to
override the default PUA properties. Through most of the 1990s, most
existing applications and technologies didn't support Unicode at all, or
very small parts of it, and the solution generally was to update them so
that they would. The same should be true here. I would suggest that
installing a modified copy of UnicodeData.txt seems like a rather clumsy
solution; if text files are involved, I'd suggest leaving
UnicodeData.txt alone and creating some sort of "overrides" file.
-- Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell -----Original Message----- From: Richard Wordingham Sent: Sunday, August 21, 2011 9:48 To: unicode_at_unicode.org Subject: Re: RTL PUA? On Sun, 21 Aug 2011 01:44:02 +0000 "Doug Ewell" <doug_at_ewellic.org> wrote: >> The more I think of it, the more I like the idea of reassigning the >> default BC of Plane 16 to 'R'. What would the arguments against this >> be? >> BC of 'AL'? > Would that really be a better default? I thought the main RTL needs > for the PUA would be for unencoded scripts, not for even more Arabic > letters. (How many more are there anyway?) Not necessarily better, I'm just suggesting that both need to be supported. However, we need to look at use cases. (1) Unencoded Arabic script letters with joining behaviour, for use with any application. (a) We need the character to have AL, R or ON for it to be included in BiDi runs. If we use ON we may need RLM when the character is at the edge of a run, and even then, its behaviour may be no better than a character with a BC of R. (b) It may get left out of script runs. There were problems on Windows with the Tamil ligature k.SS not rendering, despite font support, when the character U+0BB7 TAMIL LETTER SSA was new. And that's in a left-to right script with a character in the appropriate block! (2) Complete right-to-left script. I'm presuming the difference between AL and R is then a matter of what right-to-left script the potential users chiefly also use. (a) As a practical implementation, the distinction between AL and R would matter if the script has modern use. Otherwise, any of ON, AL and R would do, though one might face the annoyance of having to start chunks of text with RLM. If a script with modern use should be encoded using a BC of R, then I believe ON would also do as a stop-gap until the script is encoded. How fiendish is BiDi-sensitive transliteration? (b) For experimentation, I believe the difference between AL, R and ON would matter little, even though it would be irritiating to have to use RLM. (c) Complex script support is patchy - one might be restricted to applications that allow the font to provide full complex script support. The big issue in all this, though, is (i) how to update the rendering system with a new set of values for Unicode properties, including script, and (ii) the scope of such an update. (The distinction between the PUA and the rest is that it makes sense for PUA properties to change as freely as fonts.) This, incidentally, is analogous to locales reflecting code page selections. There is also, though less pressing, the issue of tailoring collations. (The worst issue is there is distinct canonically inequivalent characters of type Lo comparing equal - I've seen it for Canadian Aboriginal Syllabics for Windows XP and for Thai in Ubuntu 10.04 - surely that's not the normal British collation of such characters.) One minor problem with (i) *was* that it wasn't clear how one should annotate a copy of UnicodeData.txt to show that it has been modified. The standard XML alternative provides allows for comments, thereby solving that problem. If Issue (i) can be readily solved at the machine or user level or lower, then the default properties of the PUA become irrelevant. Richard.Received on Sun Aug 21 2011 - 12:02:43 CDT
This archive was generated by hypermail 2.2.0 : Sun Aug 21 2011 - 12:02:44 CDT