From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Mar 31 2004 - 16:42:32 EST
On 31/03/2004 12:28, Ernest Cline wrote:
>
>
>
>> ...
>>
>>This is the kind of stuff the UTC refuses to start up by trying
>>to provide some subdivision of semantics in the PUA. *That* is
>>the principle, by the way, which guides the UTC position on
>>the PUA: Use at your own risk, by private agreement.
>>
>>
>
>Which is why if any private use characters with default characteristics
>other than those of the existing Private Use blocks are ever to be part of
>Unicode they will need to be added as additional Private Use blocks,
>not by redefining existing PUA's
>
>There are currently some 10 totally unused planes, with not even any
>tentative plans for them, Allocating one or two those into additional
>Private Use Areas with a variety of default characteristics instead of
>the monotonous default characteristics of the existing Private Use
>Areas should not prove too difficult. For example, 26 blocks of 128
>Private Use Combining Marks each, each block corresponding to
>one of the existing canonical combining classes (with perhaps a
>larger block for class 0) would amply satisfy the needs of most
>private use scripts for combining marks. Similarly, blocks for
>additional characters that would have other properties should
>be simple to define and for most combinations of property values,
>128 characters should also prove to be exceedingly ample
>
>I'd have to take the time to list them, but a quick glance convinces
>me that there are at most several hundred combinations that would
>need to be supported if we limit things to just those combinations
>already in use. (it might take more, if for example all 256 potential
>combining classes were supported instead of the 26 listed in
>UCD.html), At 128 characters per combination plus more for a
>few that might need them, it should prove possible to handle this
>in 1 or 2 planes.
>
>
>
>
>
>
>
>
>
Ernest, I support your general ideas here. But I am concerned about the
implications of defining PUA characters with combining classes other
than zero. I can see this causing some confusion with normalisation etc.
And it does hugely multiply the number of PUA characters required.
Let's think when one might need PUA characters with cc>0. The relevant
cases are all like <B, M1, M2>, where B is a base character and M1 and
M2 are combining characters, one or both of them in your proposed
extended PUA. And cc>0 is required only if you want this sequence to be
canonically equivalent to <B, M2, M1>, and so want one of these to be
converted to the other during normalisation - a reordering which can
only happen if M1 and M2 both have cc>0 (and different).
Is it really necessary to support to this level of detail the concept of
canonical equivalence of PUA sequences? Would it not be enough for those
specifying the PUA characters to specify one of the orderings as correct
and the other as a spelling error? I really can't see this requirement
being widespread enough to justify defining the thousands of PUA
characters with different combining classes which you propose.
My proposal would rather be for a single group of PUA combining marks
which all have cc=0, and are all "default ignorable", with the result
that they are not displayed when a regular font is selected. These could
be used for non-standardised diacritics, mark-up (I mean this in the
old-fashioned sense of marks added to the text rather than as a way of
specifying formatting etc) etc, and also in effect as variation
selectors if the private font specifies pseudo-digraphs. I don't know
exactly how many might be required, but I am thinking tens or hundreds
rather than thousands.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 17:38:13 EST