Comments on UAX#14

Eric Muller, Adobe Systems

August 2, 2010

 

 

UAX#14, version 6.0.0 draft2, section 3:

When compression or expansion is allowed, a locally optimal line break seeks to balance the relative merits of the resulting amounts of compression and expansion for different line break candidates. When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and U+3000 IDEOGRAPHIC SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. When expanding or compressing intercharacter space, the presence of U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER is always ignored.

In the real world, the treatment of the U+3000 is not as described. Instead it is treated as any other CJK ideograph, and by default remains in the measured part of lines when at a line extremity (i.e. it does not disappear or hang in the margin like a U+0020 in western texts). We propose to remove “and U+3000 IDEOGRAPHIC SPACE” from the text above.

Similarly, under the description of the ID class:

U+3000 IDEOGRAPHIC SPACE may be subject to expansion or compression during line justification.

We propose to remove that text altoghether.

Because the issue here has to do with justification rather than linebreak, this could be clarified in the introduction as well, by amending:

In line breaking it is necessary to distinguish between two related tasks. The first is the determination of all legal line break opportunities, given a string of text. This is the scope of the Unicode Line Break Algorithm. The second task is the selection of the actual location for breaking a given line of text. This selection not only takes into account the width of the line compared to the width of the text, but may also apply an additional prioritization of line breaks based on aesthetic and other criteria. What defines an optimal choice for a given line break is outside the scope of this annex, as are methods for its selection.

to:

In line breaking it is necessary to distinguish between two three related tasks. The first is the determination of all legal line break opportunities, given a string of text. This is the scope of the Unicode Line Break Algorithm. The second task is the selection of the actual location for breaking a given line of text. This selection not only takes into account the width of the line compared to the width of the text, but may also apply an additional prioritization of line breaks based on aesthetic and other criteria. What defines an optimal choice for a given line break is outside the scope of this annex, as are methods for its selection. The third is the possible justification of lines, once actual locations for line breaking have been determined, and is also out of scope for the Unicode Line Break Algorithm.