From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Feb 23 2009 - 13:41:49 CST
> theoretically your
> implementation wouldn't be conformant to UCA for the
^^^
recte: Unicode Normalization Forms
I spend so much time thinking about the UCA that my fingers
seem disconnected from my brain sometimes when typing
TLA's. ;-)
> million combining character sequence, but realistically,
> who would care?
Also, one should note that the Stream-Safe Text Format
was added to UAX #15 precisely because of worries about
unbounded sequences of non-starters and their impact
on the ability to normalize correctly in protocols that
may use streaming text.
An implementation concerned about warding off worst-case
normalization behavior for potentially malicious sequences
of non-starters in data could simply declare that it
is using the Stream-Safe Text Process (see D8 in Section 21
of UAX #15, and Conformance clause UAX15-C4). That automatically
sets what I was calling the governor count to 30. Any
non-starter sequence longer than 30 characters results in
insertion of a CGJ, thereby automatically bounding the searchback
for canonical reordering.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Feb 23 2009 - 13:43:27 CST