UAX#14 request to reconsider Line Break property for U+3000 IDEOGRAPHIC SPACE

From: Koji Ishii <kojiishi_at_gluesoft.co.jp>
Date: Mon, 11 Jun 2012 04:52:46 -0400

Hello Unicoders,

I suppose this is the correct list to discuss on UAX#14, please correct me if I'm wrong.

In short, as subject says, I would like to reconsider the Line Break property for U+3000 IDEOGRAPHIC SPACE, so that it does not allow break before.

I wasn't part of original discussions for UAX#14, so I may be repeating discussions that were already done. I apologize if that's the case, but I hope I can provide some new information here.

How to handle U+3000 IDEOGRAPHIC SPACE in line breaking is a little controversial in East Asia, and not all applications handle the same way today. Its primary reason I believe is the best method varies by requirements.

After doing several investigations and discussions, I think many I talked to got a consensus that prohibiting break before is the best general answer. Exactly which class to use is a bit unclear to me, I'd appreciate anyone's advice but I guess it's a discussion after we agreed to make the change, so I'm leaving it for now.

Here's the background of the proposal and what I discussed with people.

Many people here might already know but almost every traditional East Asian word processors treated U+3000 as ID, and I'm guessing it is the reason why UAX#14 defines so. Many, including I, agreed that it will give the best editing experience for East Asian scripts.

East Asian versions of MS Word took different approach though, primarily due to its re-flow architecture. It might not be well-known, and is already a past story, but up until Word 95, Word changed line breaks slightly when its printer was changed for some good reasons at that point, so its documents needed to look good even if line breaks were changed after the author has sent it to someone else.

And problem arose, because people did not want U+3000 appearing at the beginning of a line as a result of such re-flow. ID give the best editing experience, but it does not fit well for such re-flowable documents, and the importance for U+3000 not appearing at the beginning of a line is bigger than slightly better editing experience. The same issue is happening today to other re-flowable documents such as HTML or EPUB.

Ambrose told me that there's a same issue in Chinese, known as honorific spaces[1]. We tried to find examples of line breaking behavior for honorific spaces without luck, and then Kenny pointed out that authors will adjust text so that it will not appear at the beginning of a line and therefore we will not be able to find it[2].

I also had discussions with W3C I18N WG JLTF (who authored JLREQ,) professional printers, and people working on EPUB in Japan for the ideal behavior of U+3000 around line breaks. As I wrote above, there are more than one best method depends on context, so discussion was a little long, but we tried to find the best algorithm that works for all cases. Two options were left; one is to mimic Word's behavior, and the other is to prohibit break before. The two methods give almost the same level of results, in some cases one is slightly superior than the other but in other cases the opposite, and all agreed that either option is acceptable for all cases we investigated. Word's behavior, however, requires slightly more logic, and does not support honorific space scenario well.

Given this result, and given the honorific space situation thanks to Ambrose and Kenny, my conclusion is prohibiting break before is the best option for everyone. It may be appropriate to allow tailoring to ID where editing experience is more important and the document is known to never re-flow, but the one I proposed here is more generic.

Allow me to end my long e-mail with a couple of notes about situation of browsers and the CSS WG. The actual browser implementation varies today. IE implements similar behavior to Word. Firefox does as I propose here; i.e., prohibit break before. WebKit and Opera handles as ID. So browsers are not interoperable today, and I'm hoping to resolve this interoperability issue with CSS Text Level 3[3]. CSS Text Level 3 is going to define line breaking behavior for CSS, and my current thinking is to define the one I'm proposing here.

I appreciate UAX#14 so much and I hope UAX#14 and CSS Text Level 3 are in sync, therefore I'm asking here to consider a change.

Any opinions, thoughts, or discussions are appreciated, and your support for this proposal is greatly appreciated in advance.

[1] http://lists.w3.org/Archives/Public/www-style/2012Apr/0013.html
[2] http://lists.w3.org/Archives/Public/www-style/2012May/0106.html
[3] http://dev.w3.org/csswg/css3-text/

Regards,
Koji
Received on Mon Jun 11 2012 - 03:54:38 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 11 2012 - 03:54:39 CDT