Feedback on the Japanese TV Symbols Proposal (L2/08-077R3)

L2/08-307
Date: 2008-08-08
Author: Markus Scherer

Reference to ARIB Standards

The proposal's reference to the ARIB standard is currently a broken link to a superseded version of ARIB STD-B24 Volume 1.
ARIB STD-B24 Table 7-10 "Additional Symbols" defines 394 symbols, assigned to JIS rows 90..94.
All but 38 of these symbols are also defined in the earlier ARIB STD-B3, which is a standard for FM Radio broadcasting.

Please change the title of the proposal to "Japanese Broadcast Symbols" since these symbols are not limited to TV broadcasting.

Please include a more explicit and more stable reference to the ARIB standard. For example:

These symbols are defined by the Association of Radio Industries and Businesses (ARIB) in standards shown in its List of ARIB Standards in the Field of Broadcasting.

The symbols are defined in

ARIB STD-B3 "ARIB Standard for Operation of The FM Multiplex Broadcasting System" and in
ARIB STD-B24 "Data Coding and Transmission Specification for Digital Broadcasting" Volume 1 ("Fascicle 1"), Chapter 7 "Character coding", Table 7-10 "Additional Symbols" and Table 7-4 "Kanji Set" and Chapter 7.2 "Universal multi-octet coded Character Set".

Compared with ARIB STD-B3, ARIB STD-B24 adds 38 symbols in JIS row 90, cells 45..63 and 66..84, for a total of 394 symbols (as of version 5.1 of ARIB STD-24).

Reference to ARIB Symbols

The proposal shows "ARIB code" values in the tables of symbols that are proposed to be encoded. This 4-digit "ARIB code" is derived from the ARIB characters' JIS code row/cell values by padding the cell number to two digits and concatenating the row and cell values to a 4-digit number. For example, where the proposal shows an "ARIB code" of 9001, it refers to the ARIB standards' character assigned to JIS row 90 and cell 1.

Please either explain the "ARIB code" notation, for example using text like the paragraph above, or else refer to "ARIB JIS row/cell values" and show them as 90/1 (or 90/1) and 93/30, or similar.

Proposed and Non-Proposed Symbols

Not proposed for encoding:

"Close caption symbols which are sequences of Latin text sometimes requiring a pair of characters (such „(ce‟ and „mb)‟, in all ARIB 9256-9285."

Some of these symbols enclose a single Latin letter, some two letters, some symbols are pairs intended to be put next to each other and together enclose three or four Latin letters.

Question: Why omit these but encode other parenthesized Latin letters, and other compatibility variants of (sequences of) ASCII characters?

Further not proposed:

"Duplicate within the ARIB set (such as 9058 and 9330), in that case only one instance is proposed"

While 90/58 and 93/30 look the same (Kanji "two" in an enclosing square), the former's description is "bilingual broadcasting service" and the latter is in a contiguous set of characters for 1, 2 and 3.

Question: Should we treat this as a duplicate in the source character set, or disunify?
Note: L2/08-188 item 1 refers to the same pair of characters

There are 394 symbols defined in ARIB STD-B24 Table 7-10. This proposal is for adding 186 of them to Unicode. The Status section lists 41 "ARIB characters were deliberately not encoded", with explanations of which characters and why. That leaves 167 ARIB symbols that are not proposed for encoding. It is not very easy to review which ARIB symbols are not proposed for encoding, partly because the ones that are proposed for encoding are grouped logically, not ordered by ARIB row/cell values, unlike in the ARIB standards.

Please add a table to the document to make it easy to review the full set of ARIB symbols. Please list all 394 ARIB symbols in their row/cell order with their JIS row/cell values, Unicode name and/or ARIB description, Unicode code point if applicable, and whether they are proposed for addition to Unicode or unified with an existing Unicode character. Please also include the ARIB glyph for proposed-for-encoding symbols. (I assume that it would be too much work to include the ARIB glyph for non-proposed characters.) You could omit all but the first and last of contiguous ranges with obvious sequence (for example, of a range of circled numbers).

Question: Are there ARIB symbols that are neither proposed for encoding, nor unified with existing Unicode characters? If so, please provide more information, for example in the form of comments in the requested additional table.

Proposed U+26EA CHURCH = ARIB-9115 has an ARIB description of "remains of a castle".

Please align the Unicode name with the ARIB description, or use "CHURCH RUIN" or similar with the ARIB description in the annotation.