I wouldn't be adverse to adding [:cn:][:cs:][:co:] to [:gcb:control:]. It
would make it align more with the current definition of Grapheme_Base.
As to how to handle private use characters, UAX #29 already allows
overriding:
"This specification defines *default* mechanisms; more sophisticated
implementations can *and should* tailor them for particular locales or
environments."
I'll file an agenda item for the August UTC meeting to consider this; you
can also add your feedback to the UTC using the reporting form.
Mark
*— Il meglio è l’inimico del bene —*
On Tue, Jul 5, 2011 at 16:31, Karl Williamson <public_at_khwilliamson.com>wrote:
> On 07/05/2011 09:29 AM, Mark Davis ☕ wrote:
>
>> Ah, you're right; I wasn't looking carefully enough at what you wrote.
>>
>> Yes, an unassigned code point (Cn) is treated as a base character.
>>
>> Unassigned code points are peculiar beasts, since we don't know really
>> how they should behave until (and if) they are assigned. Their treatment
>> by the Unicode algorithms varies based on some factors:
>>
>> * safety - don't have them behave in a way that causes problems
>> * foresight - have them behave like the most likely candidate for
>> future assignment
>> * simplicity - since they shouldn't occur normally in text, don't
>> spend too much time worrying about them.
>>
>> These are not formalized principles, just my observations on how we've
>> operated over the years.
>>
>> Mark
>> /— Il meglio è l’inimico del bene —/
>>
>
> Thanks for the answer. It does seem weird to me to treat them as base
> characters.
>
> But, I'm wondering then about Cs, isolated surrogates. They also are
> treated as base characters. That seems wrong to me. Since UTS18 is
> starting to mention the possibility of them in regexes, perhaps this should
> be addressed?
>
> Also, my understanding of UAX #44 is that private use code points may or
> may not be treated as base characters at the application's discretion. But
> this isn't mentioned in UAX#29.
>
Received on Wed Jul 06 2011 - 15:30:35 CDT
This archive was generated by hypermail 2.2.0 : Wed Jul 06 2011 - 15:30:36 CDT