From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 29 2010 - 17:52:59 CDT
A couple of weeks ago, in this thread Philippe Verdy said:
> Breaking on words, even if it requirs a very modest buffering,
> will significantly improve the processing time,
> because each word in the long texts will be scanned only
> once, and all the rest will occur within the small and
> constantly reused buffer.
...
> I don't forget that in most practical cases, sorts will operate
> on texts whose collation keys have been only partly
> generated and truncated, because they really speed up and
> reduce the number of compares to perform ...
and so on.
Instead of continuing the discussion with a back and forth in
email, I decided instead to write a Unicode Technical Note
on the general topic, including a case study of alternative
orderings for a French topic list.
Those who are interested in collation and in the particular issues
that were discussed in this thread may wish to take a look:
http://www.unicode.org/notes/tn34/
--Ken
This archive was generated by hypermail 2.1.5 : Thu Jul 29 2010 - 17:54:51 CDT