Potential contradiction between the WordBreak test data and UAX #29
tom at osg.samsung.com
Wed Nov 23 06:04:30 CST 2016
On 23/11/16 11:45, Daniel Bünzli wrote:
> On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote:
>> I took a look at the ICU sources, and they explicitly mention this case,
>> so it seems I was mistaken with interpreting the intention of the UAX. I
>> still find it confusing, but based on this thread, it seems to just be me.
> It's not only you, I also sometimes get confused by it (see for example  and subsequent messages). Maybe the operational model could be clarified a bit.
The comment I quoted from the ICU sources clarifies the intention. Maybe
a comment similar to one would be helpful?
Also, thinking about it a bit more, the operational order makes sense
when you consider the CR LF case and extended characters, however it is
still not obvious from the wording.
More information about the Unicode