Trying to understand Line_Break property apparent discrepancy
public at khwilliamson.com
Mon Jan 11 18:16:56 CST 2016
On 01/11/2016 03:42 PM, Karl Williamson wrote:
> It appears that
> http://www.unicode.org/Public/8.0.0/ucd/auxiliary/LineBreakTest.txt is
> testing a tailoring rather than the default line break algorithm,
> contrary to its heading "# Default Line Break Test". And
> http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/LineBreakTest.html follows
> For example, the default algorithm as shown in
> http://www.unicode.org/reports/tr14/#Table2 follows LB25, which is an
> approximation of the desired behavior. But the test and html don't
> follow this. I suspect they are looking for the tailoring described in
> http://www.unicode.org/reports/tr14/#Examples example 7.
> For example, the test file tests for, and the html says that a class CL
> code point followed by a class PO one is an unconditional line break
> opportunity, based on rule 999. (which is the same as LB31 in TR14)
> Whereas, http://www.unicode.org/reports/tr14/#Table2 says that a class
> CL code point followed by a class PO one is an
> "indirect break opportunity B % A is equivalent to B × A and B
> SP+ ÷ A; in other words, do not break before A, unless one or more
> spaces follow B." This is by LB25 and LB18.
> There is a discrepancy here, which could be resolved either by changing
> the tests and html to follow LB25, or documenting that these are for
> something above and beyond the default algorithm. (There may also be
> other discrepancies that I haven't stumbled against)
Ooops. I didn't see this statement in the html file:
"The Line Break tests use tailoring of numbers described in Example 7 of
Section 8.2 Examples of Customization. They also differ from the results
produced by a pair table implementation in sequences like: ZW SP CL."
This explains everything. Please disregard the earlier email from me.
More information about the Unicode