L2/11-423

November 1, 2011

Summary of PRI 207 feedback

Eric Muller

This document is an attempt to organize the feedback received on PRI 207, "UTR #50, Unicode Properties for Vertical Text Layout", both via the forum and via the contact form. Sorry if I misrepresented your comments.

1. Normative status

The normative status of UTR#50 is not clearly defined. This could be addressed by adding at the end of section 1:

The properties and algorithms presented in this report are informative. The intent is to provide a reasonable determination of the spacing and orientation of characters in Japanese texts, which can be used in the absence of other information, but can be overridden by the context, such as markup in a document or preferences in a layout application. This determination is based on the most common use of a character, but in no way implies that that character is used only in that way.

For more information on the conformance implications, see TUS, section 3.5, Properties, in particular the definition (D35) of an informative property (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf#G48960).

2. Grapheme clusters

In section 2, replace the second sentence of the second paragraph by:

A possible approach is to start with the legacy grapheme clusters or extended grapheme clusters as defined in [UAX29]. The spacing class and orientation for a grapheme cluster as a whole is then determined by taking the spacing class and orientation of the first character in the cluster, with the following exceptions:

if the cluster contains an enclosing combining mark (general category Me), then the whole cluster has spacing class cl-19.3 symbol, and orientation U.

if the cluster is made of U+0020 SPACE and some combining mark(s), then the whole cluster has spacing class cl-27, western and orientation S.

if the cluster is made of U+3000 IDEOGRAPHIC SPACE and some combining mark(s), then the whole cluster has spacing class cl-19.3 symbol, and orientation U.

3. Scope of UTR#50 (CJK vs. worldwide)

From the CCS Writing Mode and CSS 3 Text editors:

UTR #50 scopes itself to Japanese layout. However, CSS needs to address all vertical writing systems. If the scope is not broadened to include other writing systems, we cannot rely on UTR#50.

(We define vertical writing systems as those in which entire compositions, not just small snippets such as image captions or table headers, are commonly written vertically.)

From Eric Muller:

If you consider the issue of spacing characters on a line, you will notice that the method commonly used for Western texts (adjust the width of space characters, and may be trigger letterspacing) and the method commonly used for Japanese texts (adjust the space between characters, with a system of priorities) are incompatible; applying one method to the other context leads to incorrect results for that other context. The implication is that a universal layout engine needs a way to select one or the other method (a locale may be a good start).

Similarly, there is nothing that prevents a universal layout engine from using UTR#50 for a Japanese/East Asian context, and another method for a Western context. Indeed, that seems necessary as a Western vertical line of English will typically have all its Basic Latin letters upright, which an East Asian line with have them sideways.

Furthermore, I don't think there is a lot of value in creating corresponding properties for the Western world: Western Class would split, more or less, U+0020 in one class and all the other characters in another; Western Orientation would be upright for all characters.

4. Japanese vs. Chinese

From Ken Lunde:

There are some regional differences. In China, for example, U+FF01 and U+FF1F (cl-04), and U+FF1A and U+FF1B (cl-05) should be shifted up and to the right, similar to small kana in Japanese, and should thus be changed from U to T, at least for Chinese.

Should there be a way to indicate region-specific differences, such as #5 above? Add a column for region in the datafiles?

From Chinglan Ho:

If TR50 also applies for Traditional Chinese, comma(i.e. U+3001, U+FE10, U+FE11), full stop(i.e. U+3002, U+FE12), colon(U+FE13), semicolon(U+FE14), exclamation mark(U+FE15) and question mark(U+FE16) have to be in the center of the cell not the upper right corner.

5. Comment column in table 3

In table 3, the comment "mirroring, not just rotation" for 30FC is misleading. The change of shape is rooted in the calligraphy of this character and "mirroring" does not do justice to that. There is a more appropriate description in JLREQ.

With that in mind, the consensus is to simply remove the "comment" column from table 3.

6. Tailoring

From the CCS Writing Mode and CSS 3 Text editors:

UTR #50 makes no mention of tailoring the orientations. We think the orientation classes should be tailorable; probably Unicode agrees, but this should be more clearly explained.

So that we don't have to manage codepoint-by-codepoint character classes, we'd eventually like UTR#50 to include classes that are commonly tailored / not tailored, that we can reference. Some possibilities:

class for characters that are generally not tailored, i.e. vertical-native scripts such as Han, Hangul, Phags-Pa etc.

class for characters that belong to Western writing systems (typically set sideways) but are often set upright as symbols, i.e. Latin, Greek, and Cyrillic

brackets, which are pretty much never tailored to upright

We will try to draw up more concrete suggestions; if anyone else has suggestions for such classes, we'd be interested to hear those suggestions.

From Eric Muller:

I think there are two aspects:

the principle of tailoring - I think that's a given as soon as we say the property is informative. In the proposed addition for the normative status, I added a link to the definition of informative properties (D35), which gives a lot of details.

making the tailoring easy, by defining "small enough" property values, that the tailoring is simply a remapping of those property values, and does not have to work on code points. I note that InDesign provides two tailorings: the first is "Roman Upright in Vertical" to basically treat Latin, Greek and Cyrillic as cl-19.3 symbol/U; the second is "Treat Smart Quotes as Full Width", which deals with 2018, 2019, 201C, 201D. I would have no problem introducing new classes/orientation along those lines, when there is a concrete proposal.

7. Change of property values

7.1 Hangul characters

All the Hangul characters should be changed to be treated as ideographs. More precisely:

create a new class cl-19.4, hangul

switch these characters to cl-19.4, hangul:

1100..11FF Hangul Jamo
302E 〮 HANGUL SINGLE DOT TONE MARK
302F 〯 HANGUL DOUBLE DOT TONE MARK
3130..318F Hangul Compatibility Jamo
A960..A97F Hangul Jamo Extended-A
AC00..D7AF Hangul Syllables
D7B0..D7FF Hangul Jamo Extended-B
FFA0 .. FFDF ﾠ .. ￟ HALFWIDTH HANGUL FILLER ..

switch these characters to cl-19.3, symbols:

3200 .. 321E ㈀ .. ㈞ PARENTHESIZED HANGUL KIYEOK .. PARENTHESIZED KOREAN CHARACTER O HU

switch all the characters above to orientation U

7.2 Yijing Hexagram Symbols

The Yijing Hexagram Symbols (4DC0..4DFF) should be treated as symbols. The data file should be changed to:

4DC0 .. 4DFF ; cl-19.3, symbol ; U

7.3 Small Form Variants

The Small Form Variants (FE50..FE6F) should be treated like their fullwidth counterparts. The data file should be changed to:

    
    FE50 ; cl-07, comma ; U # SMALL COMMA
    FE51 ; cl-07, comma ; U # SMALL IDEOGRAPHIC COMMA
    FE52 ; cl-06, fullwidthStop ; U # SMALL FULL STOP
    FE53 ; cl-19.3, symbol ; U    # reserved
    FE54 ; cl-05, middleDot ; U # SMALL SEMICOLON
    FE55 ; cl-05, middleDot ; U # SMALL COLON
    FE56 ; cl-04, dividing ; U # SMALL QUESTION MARK
    FE57 ; cl-04, dividing ; U # SMALL EXCLAMATION MARK
    FE58; cl-03, hyphen ; U # SMALL EM DASH
    FE59 ; cl-01.2, openingBracket.round ; SB # SMALL LEFT PARENTHESIS
    FE5A ; cl-02.2, closingBracket.round ; SB # SMALL RIGHT PARENTHESIS
    FE5B ; cl-01.3, openingBracket.other ; SB # SMALL LEFT CURLY BRACKET
    FE5C ; cl-02.3, closingBracket.other ; SB # SMALL RIGHT CURLY BRACKET
    FE5D ; cl-01.3, openingBracket.other ; SB # SMALL LEFT TORTOISE SHELL BRACKET
    FE5E ; cl-02.3, closingBracket.other ; SB # SMALL RIGHT TORTOISE SHELL BRACKET
    FE5F ; cl-12, prefixAbbrev ; U # SMALL NUMBER SIGN
    FE60 ; cl-19.3, symbol ; U # SMALL AMPERSAND
    FE61 ; cl-19.3, symbol ; U # SMALL ASTERISK
    FE62 ; cl-19.3, symbol ; U # SMALL PLUS SIGN
    FE63 ; cl-19.3, symbol ; U # SMALL HYPHEN-MINUS
    FE64 ; cl-19.3, symbol ; U # SMALL LESS-THAN SIGN
    FE65 ; cl-19.3, symbol ; U # SMALL GREATER-THAN SIGN
    FE66 ; cl-19.3, symbol ; U # SMALL EQUALS SIGN
    FE67 ; cl-19.3, symbol ; U # reserved
    FE68 ; cl-19.3, symbol ; U # SMALL REVERSE SOLIDUS
    FE69 ; cl-12, prefixAbbrev ; U # SMALL DOLLAR SIGN
    FE6A ; cl-13, postfixAbbrev ; U # SMALL PERCENT SIGN
    FE6B ; cl-19.3, symbol ; U # SMALL COMMERCIAL AT
    FE6C..FE6F ; cl-19.3, symbol ; U # reserved

7.4 Superscript and subscript characters

The superscript and subscript characters should be treated as cl-27 Western and orientation S. The data file should be changed to:

    00B2 ; cl-27, western ; S
    00B3 ; cl-27, western ; S
    00B9 ; cl-27, western ; S
    2070 .. 209F ; cl-27, western ; S

7.5 Small kana characters

The small kana characters should have orientation T (rather than TK) because the transformation is necessary: they are aligned on their bottom side in horizontal lines, and on their right side in vertical lines. The property value TK can be removed.

Currently, the small hiragana and katakana characters are in single class, cl-11, smallKana. The proposal is to create two subclasses, one for hiragana (cl-11.1, smallHiragana) and one for katakana (cl-11.1, smallKatakana), and to change:

3041, 3043, 3045, 3047, 3049, 3063, 3083, 3085, 3087, 308E, 3095, 3096 to cl-11.1, smallHiragana
the other characters currently in cl-11, smallKana, to cl-11.2, smallKatakana.

7.6 U+3030 〰 WAVY DASH

U+3030 〰 WAVY DASH is currently U and should be T instead. Just like U+301C 〜 WAVE DASH, it rotates/mirrors.

7.7 U+3000 IDEOGRAPHIC SPACE

Should U+3000 (cl-14) be S instead of U? I suggest this, because for fonts that include proportional or non-full-width glyphs by default, such as for kana, hangul, or ideographs, the glyph for U+3000 is likely not to be full-width, and will have a width that works well with the glyphs from those scripts. This suggests that in order to capture the same glyph width in vertical that the glyph should be rotated, and not set upright.

7.8 Quotes

Characters: U+2018 and U+2019
The current vertical orientation in the draft UTR #50: T
The vertical orientation that I recommend: SB

As far as I think, from viewpoints of Japanese typography, the vertical orientation of the single quotation characters U+2018 and U+2019 should be SB, for the reason mentioned below:

JIS X 4051 assigns vertical-specific glyph shapes only to the single quotation marks (U+2018 and U+2019), while no vertical-specific glyphs are given to the double quotation marks (U+201C and 201D). This inconsistency should not be brought into the vertical orientations that the UTR #50 is to define. It is thought that one of the causes for this inconsistency is that JIS X 0213:2000 listed a glyph shape pair that seemed to be usable as the single quotes' vertical glyphs. JLREQ seems to have simply inherited it. However, there are different conventions in what glyphs in what posture should be used for the single quotes composed in the vertical writing mode. For instance, serious editors, book typographers or printing historians often argue that the horizontal Western quotes should correspond to the standard vertical Japanese quotes if used in vertical lines. But if so, such conversion is beyond the scope of the relevant issue to be handled by the UTR #50, and writers and editors should input the Japanese quotation characters directly, instead of using the Western single quotes.

7.9 Half-width Katakana

From Ken Lunde:

I propose that half-width katakana (U+FF61 through U+FF9F) be given one of two possible treatments:

1) Rotated (S), but that punctuation and symbols be rotated or transformed. This means that U+FF65 (cl-05), U+FF70 (cl-10), and U+FF66 plus U+FF71 through U+FF9F (cl-16) should be changed from U to S, and that U+FF67 through U+FF6F (cl-11) be changed from TK to S. U+FF62 (cl-01.1), U+FF63 (cl-02.1), U+FF61 (cl-06), U+FF64 (cl-07) are okay as-is.

2) Apply NFKD or NFKC. This is a much more radical approach, and while I would not necessarily recommend it, it should be put on the table. In other words, this would prohibit half-width katakana from being used in vertical writing, and would convert them into their full-width counterparts

More details and background.

A few short years ago, the proverbial "final nail for half-width katakana's coffin" was about to be pounded in, and then mobile came along, which effectively revived its use. The reasoning is that for small screens, it was possible to fit more information in the same real estate. Setting these characters in vertical writing is very much an edge case, but nonetheless needs to be covered. If the glyphs are set upright (U), it detracts from one of the reasons why these characters were entered by the user to begin with, meaning that they should simply be rotated 90 degree clockwise (S), like typical Latin glyphs. Using forms that are upright and compressed to half-width is not a viable solution. We should support fonts' existing glyphs as-is, and not propagate new glyphs for this purpose.

7.10 Fullwidth characters

From Ken Lunde:

In cl-05, I think that U+FF1A should be changed from U to S, at least for Japanese. JIS X 0213 defines that the vertical version of this character is rotated.

In cl-06, I think that U+FF0E should be changed from U to T. Likewise, in cl-07, I think that U+FF0C should be changed from U to T.

From Eric Muller:

JIS does not account for the fullwidth characters. The approach taken in the current draft, for those characters which exist as a pair "regular"/fullwidth, is to use U for the fullwidth and S for the other. AFAICT, this is consistent with the implementations which recognize both kinds of characters, such as InDesign.

7.11 Yi characters

The co-editors of CSS3 Writing Modes and CSS3 Text suggest to investigate the orientation of Yi characters.

7.12 Egyptian hieroglyphs

The co-editors of CSS3 Writing Modes and CSS3 Text suggest to change the orientation of Egyptian Hieroglyphs from S to U.

8. Symbols

In the version of the TR under review, all the "symbol" characters, in a broad sense and by opposition to the "letter" characters, have been classified as cl-19.3, symbol and orientation U. The rationale for this initial assignment is that:

it is pretty clear that we want some of the symbols characters to be U, e.g. the emojis.
there is no symbol character which should be obviously S
drawing a line among the "symbol" characters is tricky, because it involves deciding more specific uses beyond the broad category of "symbol"; experience tells us that deciding deciding that a "symbol" is used as math or not.

8.1 From the co-editors of CSS3 Writing Modes and CSS3 Text

The co-editors of CSS3 Writing Modes and CSS3 Text suggest to split this broad "symbol" category in two, one part retaining the U orientation, and the other being given the S orientation. More specifically:

8.1.1 Bracket pieces

The bracket pieces and similar characters should be treated as cl-27 Western and orientation S. The data file should be changed from:

2300 .. 23FF ; cl-19.3, symbol ; U

to:

2300 .. 231F ; cl-19.3, symbol ; U
2320 .. 2321 ; cl-27, western ; S
2322 .. 239A ; cl-19.3, symbol ; U
239B .. 23B3 ; cl-27, western ; S
23B4 .. 23B6 ; cl-19.3, symbol ; U
23B7 .. 23B9 ; cl-27, western ; S
23BA .. 23CF ; cl-19.3, symbol ; U
23D0 ; cl-27, western ; S
23D1 .. 23FF ; cl-19.3, symbol ; U

(This particular change is seconded by Asmus Freytag.)

8.1.2 Box drawing characters

The box drawing characters (2500..259F) should be changed to S.

8.1.3 Arrows

The arrow should be changed to S: So characters in the 2190..21FF, 261A..261F, 2794..27BE, 2B00..2B11, and 2B45..2B46 ranges; and Sm characters in the 27F0..297F and 2B30..2B4C ranges.

8.1.4 Math

Because of the following reasons:

digits are typeset sideways by default
commonly used variable names (Latin, Greek) are typeset sideways by default
we expect superscripts and subscripts to typeset sideways by default
arrows, which function as relations in math, would also be typeset sideways by default (see separate comment)
ASCII math symbols are expected to typeset sideways
mathematical formulae are usually typeset sideways even in vertical text
the most commonly-used symbols that are intermixed with prose (× and +) are symmetric wrt rotation, and the equals sign (=) seems to be typeset sideways even when everything else is upright (http://fantasai.inkedblade.net/style/scans/ChinatownSFPL028.png)

we suggest math symbols should be typeset sideways by default.

When intermixed in prose, variable names are often typeset upright, and in such styles math symbols might also be typeset upright. However in these situations some tailoring is necessary for the variable names whatever the mathematical default, so using this style to determine the default rules in plaintext does not make sense.

The default orientation of fullwidth math symbols is less clear, since fullwidth characters typically provide an orientation contrast with their ASCII counterparts; perhaps they should be U (or T for equals).

8.2 From Nozomu Katoo

1) Arrows

When arrow symbols are used to indicate "order of process" or "from A to B", they are usually orientation sensitive and have to be displayed sideways in vertical text. But when they are used to indicate direction itself, they are of course displayed upright. In actual text, I feel that the former use ("order" use) is rather dominant.

2) Triangles

They also become orientation sensitive when they are used like arrows.

3) Half black circles

According to p. 510 in the JIS 0213:2000 standard book, circle marks with half black were picked for the standard originally as the symbols that denote the Japanese language accent. But half black circles used for such purpose are *always* orientation sensitive; they are printed as LEFT or RIGHT HALF BLACK in horizontal text whereas printed as UPPER or LOWER HALF BLACK in vertical text. Apprarently, the committee which compiled the standard seems to have overlooked this point, they included these into the standard as generic circle symbols, which now have mappings to Unicode 25D0..25D3.

As case 3 particularly relates to my field, I once tried to struggle against this "orientation" problem but could not find a good solution unless a new control character which indicates that the preceding or following character is orientation sensitive is introduced. Although it may be difficult to decide orientation of these characters automatically, but I hope that any good solution is found someday.

---

The attached images are examples of orientation-sensitive half-black circles and triangles. These are of the same text but one (Kindaichi1995.jpg) is from a book published in 1995 in horizontal text layout and the other (Kindaichi2005.jpg) is from a book published in 2005 in vertical text layout.

If the proposal of the co-editors is adopted, I would like triangles (25B2..25C5) and half-black circles (25D0..25D3) also to be treated as well. Particularly:

  Hor.    Ver.
  25B6 -> 25BC
  25B7 -> 25BD
  25D0 -> 25D3
  25D1 -> 25D2

If these mappings are adopted (i.e., placed into Category S), it would help people like me who discuss the Japanese accent.

9. Other comments

From Ken Lunde:

When declaring that a glyph is to be rotated 90 degrees clockwise for vertical writing (Categories S and SB), the operation may not be so simple. Depending on which coordinate is used as the pivot point, and the relative baselines of the scripts covered by the font, there may be X-axis shifting that is necessary after rotation. In other words, applications may need to dig into the font to figure out the parameters to use for any shifting that is necessary. This is one reason why vertical variants of what appear to be glyphs that were mechanically rotated are included in fonts. It is also one of the reasons why the 'vrt2' GSUB feature was defined in the first place.

There is an expectation that the same text can be purposed for both writing directions, which is another reason why substitutions are used, and in some cases may seem abusive, meaning that the result can be considered a completely different character. If it is unknown whether the text will be set in horizontal or vertical orientation, this makes sense. The most abusive vertical substitutions are for Chinese, as described in GB 15834-1995, specifically that U+2018, U+2019, U+201C, and U+201D become rotated versions of U+300C, U+300D, U+300E, and U+300F, respectively.

10. Acknowedgements

The following individuals have provided feedback, either publicly or privately (in no particular order): John Cowan, Hiroshi Takenaka, Jungshik Shin, Fantasai, Asmus Freytag, Ken Lunde, Taro Yamamoto, Soji Ikeda, Steve Zilles, John Dagget, Nozomu Katō, Chinglan Ho, Nat McCully, the co-editors of CSS3 Writing Modes and CSS3 Text (Koji Ishii, Shinyu Muarakami, and fantasai), Martin Dürst, Wonsuk Lee, Soo-Hyun Choi, Paul Kim.