From: Satoshi Nakagawa (snakagawa@infoteria.co.jp)
Date: Sat Feb 23 2008 - 13:48:18 CST
Hi,
I found a problem in the Unicode line breaking algorithm.
In Japanese writing, [こたえは、answer] should be breakable into
lines like:
こたえは、
answer
Because [、](U+3001) and [。](U+3002) in Japanese are used just like
comma and period in English. We can break a line after comma or
period in English.
But the current Unicode line breaking algorithm doesn't allow this
behavior for (U+3001) and (U+3002).
I think it's a problem of the Unicode line breaking algorithm.
See http://www.unicode.org/reports/tr14/ .
> CL: Closing Punctuation (XB)
>
> 3001..3002 IDEOGRAPHIC COMMA..IDEOGRAPHIC FULL STOP
(U+3001) and (U+3002) are specified as CL.
> LB30
> Do not break between letters, numbers, or ordinary symbols and
> opening or closing punctuation.
>
> CL × (AL | NU)
It says CL and a subsequent alphabetic or numeric token is not
breakable. In the result, we cannot do line breaking in any positions
of [は、answer].
IMHO, (U+3001) and (U+3002) should not be treated as CL. Because we
cannot apply LB30 to them. They should be separated as a different
class.
What do you think?
-- Satoshi Nakagawa
This archive was generated by hypermail 2.1.5 : Sun Feb 24 2008 - 12:08:34 CST