L2/23-009

Editorial Committee Report and Recommendations for UTC #174 Meeting

Source: Editorial Committee

Date: January 23, 2023

A. Unicode Release Topics

A1. Unicode 15.1.0 Report

FYI: The Editorial Committee's participation in Unicode 15.1 is relatively limited, because no publication of the core specification is planned for 15.1. The Editorial Committee has been doing some limited editorial review of UAXes and UTSes which will be revised during the 15.1 release cycle. This work will probably ramp up during the beta review period, after the April UTC meeting.


A1. Unicode 16.0.0 Report

FYI: The Editorial Committee has started review of new content planned for the eventual 16.0 publication of the core specification. In particular, contributing editors have started to deliver drafts of sections for new scripts that we anticipate will be published in 16.0. There is also ongoing work to do routine upkeep of the core specification and to stay current with bug reports and other small tweaks to core specification content mandated by the UTC.

It is now possible to do this editorial work up front, and in a somewhat more collaborative manner, now that the core specification content has all been converted into HTML from the FrameMaker format that we used for text maintenance up through Version 15.0. That converted content is available for the editors to work on in a draft GitHub repository, while we simultaneously work out the details of how to publish 16.0 in a new framework.


A3. Core Specification Future Development

FYI: The "TUS Future" project continues to be active, formally meeting approximately once a month, but with a lot of development work, particularly by Liang Hai, going on in the background between meetings. Recent developments have involved getting the contributing editors set up to work with a modern web development framework, installing node.js, so that it will be possible to view updated content correctly in a full webserver environment locally, where content will be rendered with a regular browser, just as it will eventually be viewed on the web after publication. This is much better than simply using a WYSIWIG HTML editor, and will become absolutely essential once we start componentizing various elements of the core specification text using Svelte components.

While the contributing editors are getting up to speed in this tooling, Liang Hai has also been working at developing prototype components for eventual use with the text. For example, a character component will provide a simple API that will enable editors to easily manage character name and glyph citations of the sort "U+0915 क DEVANAGARI LETTER KA", by code point, by Unicode value, or by name, and letting the backend implementation, with full access to the UCD, take care of the lookup, consistency and rendering. Liang Hai is also working out a framework that will let the contributing editors work through the copyedit pass to verify all the converted hack font display in the text, so we can check for and eliminate any rendering regressions for the converted text.


B. Website Topics

B1. Website Status

FYI: The Unicode technical website has remained stable since our last report. This will be the last time that the Editorial Committee reports on website status, as the Infrastructure Group is responsible for website stability.


B2. Website Content Maintenance

FYI: The Editorial Committee continues to provide minor maintenance of pages on the Unicode technical website. For a major initiative to update the FAQ pages, see B3 below.


B3. FAQ Updating

FYI: The FAQ of the Fortnight project continues apace. This special focus task group has almost completed its initial editorial pass through all of the active FAQ pages. The pages have all been converted to HTML5 and the editors are now using more sophisticated maintenance tooling. Once the first big pass has completed, the participants in the project will be considering possible future process improvements for the FAQ pages which might have applicability to larger chunks of the Unicode technical website eventually.


C. Editorial Committee Process Issues

FYI: The Editorial Committee continues to meet regularly. Our meetings are now generally held on a biweekly schedule.

This report to the UTC includes feedback from the Editorial Committee meetings held on November 10 and December 8, 2022 and January 12, 2023.

The Editorial Committee has been innovating in its process, and is now using GitHub repositories, both for its issue tracking (and discussion), and for ongoing editorial maintenance of the core specification.

Public-facing infomation about the Editorial Committee and its work is maintained on the Unicode Editorial Committee page on the website. The Editorial Committee also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Committee or contribute to that work should contact the Chair, Julie Allen.


D. UTR Topics

FYI: The Editorial Committee has nothing to bring up separately about various UTRs at this time.


E. PRI Topics

E1. Public Feedback on PRIs for UAXes and UTSes

E1.1 PRI #465, UAX #44

Date/Time: Thu Jan 5 23:44:42 CST 2023
Name: Ben Yang
Report Type: Public Review Issue
Opt Subject: 465


The following text is found in UAX#44:

----

Most properties have a single value associated with each code point.
However, some properties may instead associate a set of multiple different
values with each code point. For example, the provisional kCantonese
property, which lists Cantonese pronunciations for unified CJK ideographs,
has values which consist of a set of zero or more romanized pronunciation
strings. Thus, the Unihan Database contains an entry:

U+342B  kCantonese  gun3 hung1 zung1

This line is to be interpreted as associating a set of three string values,
{“gun3”, “hung1", “zung1”} with the kCantonese property for U+342B.

----

However, since I believe Unicode 14.0, "kCantonese" has been modified to
only allow a single entry, so this text is no longer accurate.

If we'd like to stick with using a Unihan property to demonstrate, how
about "kVietnamese"? Here's a suggestion for an edit:

----

Most properties have a single value associated with each code point.
However, some properties may instead associate a set of multiple different
values with each code point. For example, the provisional kVietnamese
property, which lists Vietnamese pronunciations for unified CJK ideographs,
has values which consist of a set of zero or more pronunciation strings.
Thus, the Unihan Database contains an entry:

U+6258  kVietnamese thác thách thốc thước thướt

This line is to be interpreted as associating a set of three string values,
{"thác", "thách", "thốc", "thước", "thướt"} with the kVietnamese property
for U+6258.

----

Discussion: The Editorial Committee discussed this feedback, and agrees that the text should be corrected with a new example. This feedback was also discussed by the Properties & Algorithms Group, under Issue #76 is their issue tracker, and the comments noted there have some specific suggestions for fixing the text.

AI Ken Whistler, EDC. In the proposed update draft for UAX #44 for Unicode 15.1.0, replace the partly erroneous example for kCantonese values with an updated example. Ref. L2/23-009 [Ben Yang Thu Jan 5 23:44:42 CST 2023]


E2. Public Feedback [Items noted for Editorial Committee attention]

Date/Time: Sat Nov 26 01:33:30 CST 2022
Name: Kushim Jiang
Report Type: Website Problem
Opt Subject: Unicode Terminology English - Simplified Chinese

Here are the suggestions.

(1) Accent Mark 重音标记. Consider modifying to 变音标记 because its function is “to
alter the phonetic value”.

(2) Binary File 二进制档案. Consider modifying to 二进制文件 according to
[1] “文件 file”.

(3) BNF (Backus-Naur Form) 巴克斯诺尔范式. Consider modifying to 巴克斯-诺尔范式 according
to [1] “巴克斯-诺尔范式 Backus-Naur form,BNF”.

(4) BOM (Byte Order Mark) 字节排序标记. Consider modifying to 字节次序标记 according to
[1] “大端 big-endian, 一种在内存中存放多字节数据时的字节存放次序”.

(5) Block 字区. Consider modifying to 字符块 according to the Unicode Terminology
itself “Character Block 字符块”.

(6) Character Sequence 字符顺序, Composite Character Sequence 复合字符顺序. Consider
modifying to 字符序列 and 复合字符序列 according to the meaning “an ordered sequence
of character(s)”.

(7) Code Unit 码单位, Code Value 码值. Consider modifying to 码元 (code element in
digital communication field) as code value is an “obsolete synonym for code
unit”.

(8) Combining Character 组合字符, Combining Character Sequence 组合字符序列, Combining
Class 组合类别. Consider modifying to 结合字符, 结合字符序列 and 结合类别 to emphasize the
unequal relationship between base and mark.

(9) Compatibility Equivalent 兼容等值. Consider modifying to 兼容等价 according to
the Unicode Terminology itself “Equivalence 等价”.

(10) EBCDIC (Extended Binary-Coded Decimal Interchange Code) 延伸二进制编码的十进制交换码,
ECCS (Extended Combining Character Sequence) 延伸组合字符序列, MIME
(Multipurpose Internet Mail Extensions) 多功能互联网邮件延伸定义. Consider modifying to
扩展二进制编码的十进制交换码, 扩展结合字符序列 and 多功能互联网邮件扩展.

(11) Encoded Character 已编码字符. Consider modifying to 编码字符 according to the
Unicode Terminology itself “Coded Character 编码字符, see encoded character.”

(12) Extended Base 延伸基. Consider modifying to 扩展基本字符.

(13) Higher-Level Protocol 高级规约. Consider modifying to 高层协议 according to
[1] “协议 protocol” and “规约 specification”.

(14) Modifier Letter 改形字母. Consider modifying to 修饰字母 as it does not change
the shape of the previous letter or the letter itself.

(15) Paragraph Embedding Level 段落内置层级. Consider modifying to 段落嵌入层级
according to the Unicode Terminology itself “Embedding 嵌入”.

(16) Presentation Form 表达模式. Consider modifying to 显现形式.

(17) Rich Text 加工文本. Consider modifying to 富文本 according to the Unicode
Terminology itself “Fancy Text 富文本, see rich text.”

(18) Row 横行. Consider modifying to 字符区 as row-cell structure is similar to
区位/ku-ten.

(19) Script 手稿程式. Consider modifying to 文种 according to “基本多文种平面”. And the
Glossary of Unicode Terms only contains the meaning item of 文种 without 脚本
and 书体.

(20) Spacing Mark 间距标记. Consider modifying to 含宽标记.

Reference
[1] 计算机科学技术名词 (第三版).

Discussion: This feedback constitutes a series of suggestions for updating the Unicode-related terminology translations on the following page on the website:

Unicode Terminology English - Simplified Chinese

The Editorical Committee considered the issue. Liang Hai has volunteered to scan through this feedback and to make appropriate updates to the page. No action item needs to be recorded for this item. Tracked as issue #7 in the Edcom issue tracker.


G. Miscellaneous Topics

G1. (None noted)