From: Mark Davis (mark@macchiato.com)
Date: Mon Feb 23 2009 - 13:57:01 CST
What I've been trying to say is that a word with a single non-NFC sequence
*would be* the a typical, non-contrived, "worst" case in terms of
performance. Words with multiple non-NFC sequences are a vanishingly small
proportion of the web.
- If you want a very worst case (but completely unlikely in practice,
except perhaps maliciously), something like the 999,999 combining
characters.
- If you want a typical, uncontrived, worst case, something like "*
No\u0308**rmalization*" works well.
- If you want something between those, figure out what you mean, because
I don't know of any better example.
Mark
On Mon, Feb 23, 2009 at 10:42, Asmus Freytag <asmusf@ix.netcom.com> wrote:
> On 2/23/2009 10:01 AM, Mark Davis wrote:
>
>> The worst performance would be (in the 1M character example I've been
>> using), something like a base character followed by a list of 999,999
>> characters with CCC != 0, sorted by CCC in reverse order. I added a note to
>> this effect.
>>
> No, the worst case would be the 2M example...
>
> Actually, the problem with such kind of examples is that they don't speak
> to what you can realistically expect in non-contrived situations.
>
> A./
>
>>
>>
>
This archive was generated by hypermail 2.1.5 : Mon Feb 23 2009 - 13:58:33 CST