From: Daniel Ehrenberg (microdan@gmail.com)
Date: Wed Jan 07 2009 - 12:38:14 CST
I'm implementing UAX #29 word breaking (without tailoring). Right now,
I've implemented the algorithm except that I treat rules like
Numeric (MidNum | MidNumLet) × Numeric
as
(MidNum | MidNumLet) × Numeric
The funny thing is, though, that all unit tests in WordBreakTest.txt
pass. But a string like "foo: bar" segments as /foo:/ /bar/. By my
reading of the UAX, this is incorrect, and the correct word
segmentation would be /foo/:/ /bar/. For my own project, I'll add some
additional unit tests, unless I've misread the standard. It seems to
me like these tests should be added to the WordBreakTest.txt file, and
I'd be glad to supply them. Is this possible?
Dan
This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 12:41:55 CST