From: Mark Davis (mark.edward.davis@gmail.com)
Date: Wed Jan 07 2009 - 13:12:02 CST
We are glad to take more samples in any of the break tests. Please submit
your suggested list via the unicode reporting form:
http://www.unicode.org/reporting.html
Mark
On Wed, Jan 7, 2009 at 10:38, Daniel Ehrenberg <microdan@gmail.com> wrote:
> I'm implementing UAX #29 word breaking (without tailoring). Right now,
> I've implemented the algorithm except that I treat rules like
>
> Numeric (MidNum | MidNumLet) × Numeric
>
> as
>
> (MidNum | MidNumLet) × Numeric
>
> The funny thing is, though, that all unit tests in WordBreakTest.txt
> pass. But a string like "foo: bar" segments as /foo:/ /bar/. By my
> reading of the UAX, this is incorrect, and the correct word
> segmentation would be /foo/:/ /bar/. For my own project, I'll add some
> additional unit tests, unless I've misread the standard. It seems to
> me like these tests should be added to the WordBreakTest.txt file, and
> I'd be glad to supply them. Is this possible?
>
> Dan
>
>
>
This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 13:13:56 CST