Dear,
I recently updated libunibreak[1] according to unicode 9.0.0. I thought
I implemented it correctly, however it fails against two of the tests in
the reference test data:
÷ 200D × 0308 ÷ 2764 ÷ # ÷ [0.2] ZERO WIDTH JOINER (ZWJ_FE) × [4.0]
COMBINING DIAERESIS (Extend_FE) ÷ [999.0] HEAVY BLACK HEART
(Glue_After_Zwj) ÷ [0.3]
and
÷ 200D × 0308 ÷ 1F466 ÷ # ÷ [0.2] ZERO WIDTH JOINER (ZWJ_FE) × [4.0]
COMBINING DIAERESIS (Extend_FE) ÷ [999.0] BOY (EBG) ÷ [0.3]
More specifically, it fails in both after the "combining diaeresis". My
implementation marks it as a break, whereas the test data as not. The
reference implementation, as expected, agrees with the test data.
However, looking at the test case and the UAX[2], this does not look
correct. More specifically, because of rule 4:
ZWJ Extended GAZ -> ZWJ GAZ
And then according to rule 3c, there should be no break opportunity
between them. The reference implementation, however, uses rule 999 here,
which I believe is incorrect.
Am I missing anything, or is this an issue with the reference test data
and reference implementation?
Thanks,
Tom.
[1]: https://github.com/adah1972/libunibreak
[2]: http://www.unicode.org/reports/tr29/#WB1
Received on Tue Nov 22 2016 - 09:24:16 CST
This archive was generated by hypermail 2.2.0 : Tue Nov 22 2016 - 09:24:16 CST