Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 29 2010 - 17:52:59 CDT

Next message: Martin J. DÃ¼rst: "Re: Digit/letter variants in the "same" unified script"

Previous message: Asmus Freytag: "Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)"
Maybe in reply to: Philippe Verdy: "UTS#10 (collation) : French backwards level 2, and word-breakers."
Next in thread: Werner LEMBERG: "Re: UTS#10 (collation) : French backwards level 2, and word-breakers."
Reply: Werner LEMBERG: "Re: UTS#10 (collation) : French backwards level 2, and word-breakers."
Reply: Frédéric Grosshans: "Re: UTS#10 (collation) : French backwards level 2, and word-breakers."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

A couple of weeks ago, in this thread Philippe Verdy said:

> Breaking on words, even if it requirs a very modest buffering,
> will significantly improve the processing time,
> because each word in the long texts will be scanned only
> once, and all the rest will occur within the small and
> constantly reused buffer.
...
> I don't forget that in most practical cases, sorts will operate
> on texts whose collation keys have been only partly
> generated and truncated, because they really speed up and
> reduce the number of compares to perform ...

and so on.

Instead of continuing the discussion with a back and forth in
email, I decided instead to write a Unicode Technical Note
on the general topic, including a case study of alternative
orderings for a French topic list.

Those who are interested in collation and in the particular issues
that were discussed in this thread may wish to take a look:

http://www.unicode.org/notes/tn34/

--Ken

Next message: Martin J. DÃ¼rst: "Re: Digit/letter variants in the "same" unified script"
Previous message: Asmus Freytag: "Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)"
Maybe in reply to: Philippe Verdy: "UTS#10 (collation) : French backwards level 2, and word-breakers."
Next in thread: Werner LEMBERG: "Re: UTS#10 (collation) : French backwards level 2, and word-breakers."
Reply: Werner LEMBERG: "Re: UTS#10 (collation) : French backwards level 2, and word-breakers."
Reply: Frédéric Grosshans: "Re: UTS#10 (collation) : French backwards level 2, and word-breakers."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jul 29 2010 - 17:54:51 CDT