Re: Proposed new characters updated in Pipeline Table

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 15 Aug 2011 22:00:08 +0200

You seem to think that I was speaking about a "new" Arabic Alef-Wasla.
I was absolutely not speaking about it, but about the proposed (still
unencoded) separate Wasla. If it's not encoded, it *cannot* be found
in the DerivedAge.txt....

You seem to misinterpret what I wrote by going exactly the reverse way
than what I was saying, and that I really thought was explicit.

And I spoke about the default properties of unassigned code points in
assigned blocks, this is because some of these properties are part of
the stability policy, even if those unassiggned code points are not
concerned themselves.

This is a strong enough constraint that the UTC and WG2 have still
hesitated to violate it in a new assignment of characters whose
effective properties would need to be different.

The stabilisation policy has certainly been a strong factor to avoid
it (but effectively, now that the BMP is almost full, there is
probably the desire by some to break this implicit rule in the
remaining few holes, and abandon the definition of these "default"
properties of character blocks, in favor of effective properties of
characters really assigned in those blocks ; but for me it's just a
way for them to avoid using the SMP, something that will soon be
unavoidable anyway, even for modern scripts, and notably many Indic
scripts).

The default properties in fact derive from the roadmap allocation
guidelines. Yes they are guidelines, not strict rules. But these
guidelines have still been useful to avoid the complete chaos and
simplify the implementation of Unicode applications and libraries (for
example, a compact binary representation of property loopkup tables).

Thnaks.

2011/8/15 Ken Whistler <kenw_at_sybase.com>:
> On 8/15/2011 10:38 AM, Philippe Verdy wrote:
>>>>
>>>> Unicode cannot encode a combining Wasla (because of various stability
>>>> >>  policies), so if Syriac needs a Wasla to be shown only over a letter
>>>> >>  or two, one needs to propose precomposed characters for them. Just
>>>> >>  like the existing Arabic Alef-Wasla.
>>
>> Why not? If the character is new,
>
> Occasionally, it would help if you actually did some research before heading
> off on these tangents. It is easy to determine (with the use of
> DerivedAge.txt)
> that the character Roozbeh is talking about is *not* new, but dates all the
> way back to Unicode 1.1.
>
>> it can perfectly be encoded with
>> whatever character property is needed, including with a non-zero
>> combining class, if it fits. The stability policy aboud combining
>> sequences is only for sequences of characters that are already encoded
>> and for which decomposition mappings (and the related standard
>> normalizations), as well as basic case mappings in the UCD cannot be
>> modified.
>
> Which applies in this case. If a combining Arabic wasla were to be encoded,
> it would create an alternate representation for the existing (and old)
> U+0671 ARABIC
> LETTER ALEF WASLA. That would break normalization stability, unless
> an explicit claim were made that <alef + combining wasla> is not the same
> as U+0671, which in turn would defeat the whole point of having the
> combining
> wasla encoded.
>
>> The stability policy does not concern currently unassigned code points
>
> It does, as for the case just mentioned.
>
>> (except possibly a few ones: the directionality of all code points
>> within some designated RTL blocks, should they be currently assigned
>> to characters or not;
>
> That constrains the allocation of blocks of new right-to-left scripts, but
> it
> does not absolutely prohibit the encoding of a non-RTL character within
> such a block, if the case is made. For example, a combining mark may occur
> in such a script, and its Bidi_Class will end up NSM, not one of the strong
> right-to-left values.
>
>>  and the reservation of all assigned and
>> unassigned code points in the few blocks allocated only for combining
>> characters).
>
> This is also not a stability policy claim. It is likely to remain the case,
> because
> the character encoding committees don't do random things. But there is
> no stability policy which would prevent a non-combining mark from ending
> up in such a block.
>
> Please do your homework before making claims like this which misinform
> the list.
>
> --Ken
>
>>
>
>
Received on Mon Aug 15 2011 - 15:01:48 CDT

This archive was generated by hypermail 2.2.0 : Mon Aug 15 2011 - 15:01:57 CDT