From: Daniel Ehrenberg (microdan@gmail.com)
Date: Wed Jan 07 2009 - 13:03:06 CST
I'm sorry, this was an error on my end. Ignore that message.
On Wed, Jan 7, 2009 at 12:38 PM, Daniel Ehrenberg <microdan@gmail.com> wrote:
> I'm implementing UAX #29 word breaking (without tailoring). Right now,
> I've implemented the algorithm except that I treat rules like
>
> Numeric (MidNum | MidNumLet) × Numeric
>
> as
>
> (MidNum | MidNumLet) × Numeric
>
> The funny thing is, though, that all unit tests in WordBreakTest.txt
> pass. But a string like "foo: bar" segments as /foo:/ /bar/. By my
> reading of the UAX, this is incorrect, and the correct word
> segmentation would be /foo/:/ /bar/. For my own project, I'll add some
> additional unit tests, unless I've misread the standard. It seems to
> me like these tests should be added to the WordBreakTest.txt file, and
> I'd be glad to supply them. Is this possible?
>
> Dan
>
This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 13:06:20 CST