L2/22-065
Editorial Committee Report and Recommendations for UTC #171 Meeting
Source: Editorial Committee
Date: April 13, 2022
A. Unicode Release Topics
A1. Unicode 15.0 Schedule and Planning
FYI: The significant milestones for the Unicode 15.0 release are:
- Alpha start: February 8, 2022
- Alpha close: April 5, 2022
- Beta start: May 31, 2022
- Beta close: July 12, 2022
- Release: September 13, 2022
These dates are firm now, although there could be adjustments based on decisions during upcoming UTC meetings and adjustments depending on CLDR and ICU release dates.
A2. Unicode 15.0.0 Work
Discussion: The work on Unicode 15.0 is ongoing. The Editorial Committee will be participating in the preparation of documentation for the upcoming beta review.
Note that there are ongoing discussions involving participation from various TC, SC, and group chairs about how best to address project management and release management for the Unicode Standard, both for the remainder of the 15.0 cycle, and for future releases after that. The Editorial Committee is working to redefine its scope to focus on editorial work on the Core Specification and other Unicode technical specifications, and editing of the technical content on the website. The Editorial Committee will no longer be the lead on the overall project management and release management of the Unicode Standard. Instead, that responsibility reverts to the UTC itself, with details of organizational structure and delegation of tasks still TBD, pending the results of the ongoing discussions of how to approach the problems and requirements for coordination involved.
In the meantime, during this transitional phase, this Editorial Committee report can still serve as the location for some basic bookkeeping regarding UTC approvals and actions to move Unicode 15.0 forward. In particular, it is now time to close up the alpha review cycle and start the beta review for Unicode 15.0.
AI Rick McGowan. Close the alpha review PRI #442.
AI Rick McGowan. Extend the close dates of all open PRIs for 15.0 proposed updates of UAXes and UTSes to July 11, 2022: PRI #437 (UAX #38), PRI #438 (UAX #44), PRI #439 (UAX #50), PRI #440 (UTS #10), PRI #441 (UAX #29), PRI #444 (UAX #34), PRI #445 (UAX #45), PRI #446 (UAX #14), PRI #447 (UAX #24), PRI #448 (UAX #42).
EC-UTC171-R1: The Editorial Committee recommends that:
The UTC authorizes starting the beta review for Unicode 15.0.AI Ken Whistler. Prepare an updated NamesList.txt for Unicode 15.0, synched with the final Unicode 15.0 repertoire, as finalized during UTC #171.
AI Michel Suignard, Rick McGowan. Prepare a set of Unicode 15.0 beta review code charts for posting.
AI Ken Whistler, PAG. Prepare an updated and complete set of data files for the UCD, and the data directories for UTS #10, UTS #39, and UTS #46, for the beta review of Unicode 15.0.
AI Ned Holbrook, ESC. Prepare an updated and complete set of data files for the data directory of UTS #51, and a complete set of emoji beta review charts for Emoji 15.0.
AI Ken Whistler, EDC. Prepare the draft landing page for Unicode 15.0 and Unicode 15.0 beta review page.
AI Rick McGowan. Post the PRI for the Unicode 15.0 beta review, to close July 11, 2022.
A3. Unicode 15.0 Core Specification Editing
FYI: Editing for the 15.0 Core Specification continues apace. In particular, the substantial new section for the Kawi script is almost complete, based on a very detailed contribution by Norbert Lindenberg. There have also been significant updates to the Arabic section, with major contributions to that work by Lorna Evans.
A4. TUS Future Project
FYI: A subgroup of the Editorial Committee is currently investigating the feasibility of publishing future versions of the Unicode Core Specification in HTML. Our current tooling consists of FrameMaker 2019, which produces pdf files for publication, but which cannot export HTML that is anywhere close to the kind of quality and structure we would need to publish in HTML. Rather, the subgroup is looking into complete extraction of content and structure of the Core Specification from the native FrameMaker mif files, and then building up a new framework for maintenance and publication, henceforth using the HTML files as the source for future editing.
Right now rapid prototyping is underway to evaluate the feasibility of content recovery from the FrameMaker files and then replacement of the thousands of instances of ASCII and PUA hack font glyph usage in the Core Specification with either native Unicode characters or appropriate images that would work correctly on the web. The Editorial Committee will report on this project in more detail at the next UTC meeting.
B. Website Topics
B1. Website Status
FYI: The Unicode technical website has remained stable since our last report.
B2. Website Content Maintenance
FYI: There is no new issues to report on at this time. Over the last few months, the editors have contributed to minor updates on a number of website pages, including the Unicode Consortium Policies pages.
B3. FAQ Review
FYI: The Editorial Committee has started a new initiative to do more regular review and update of the Unicode FAQ pages on the technical website. We have had a couple of meetings focused specifically on this work, which resulted in good editorial feedback and substantial improvement to several FAQ pages. Right now a small subgroup is looking into mechanisms to automate table of contents generation for FAQ pages and general improvement to the CSS for FAQ pages. We anticipate further meetings devoted to updates of selected FAQ pages that need more work.
C. Editorial Committee Process Issues
FYI: The Editorial Committee continues to meet approximately once a month via Zoom, with those monthly meetings now scheduled for 5 hours (with a lunch break), instead of the longer meetings we used to hold.
This report to the UTC includes feedback from the Editorial Committee meetings held on February 3, March 3, and March 31, 2022.
As part of the re-scoping of the work of the Editorial Committee, we will no longer be tracking details of UCD and other data file updates related to the Unicode release cycle. UCD (and data files related to UTSes) should be tracked by the Properties & Algorithms group, and the Unihan data file (and UAX #45 related data files) should be tracked by the CJK & Unihan group. Emoji-related issues should be tracked by the Emoji Subcommittee.
Public-facing infomation about the Editorial Committee and its work is maintained on the Unicode Editorial Committee page on the website. The Editorial Committee also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Committee or contribute to that work should contact the Chair, Julie Allen.
D. UTR Topics
FYI: The Editorial Committee has nothing to bring up separately about various UTRs at this time.
E. PRI Topics
E1. Public Feedback on PRI #442 (15.0 Alpha Review) Noted in L2/22-056
Discussion: The Editorial Committee reviewed the PRI #442 feedback in L2/22-056 and had the following observations:
- UCAS Glyphs: The Editorial Committee agreed that it made sense to highlight more of the updated UCAS glyphs. (Liang Hai has made those suggestions to Michel Suignard, so no action needs recording.)
- Khojki: This suggestion should be investigated by the Script Ad Hoc.
- Arabic Extended-C: These suggestions for adding some cross-references seem reasonable.
- Kawi: Suggestions for a name change like this are appropriate for investigation by the Script Ad Hoc. (See the Script Ad Hoc report.)
- Sundanese: This suggestion for updated annotations for 1BBD has already been dealt with by ballot comments on the CD.
- Devanagari Extended-A: These suggestions for adding some cross-references seem reasonable. We think cross-references from the Bhale-Mindu characters to Devanagari should be added, but not vice-versa.
Suggested associated action item:
AI Ken Whistler. Add cross-references in the names list to some characters in the Arabic Extended-C and Devanagari Extended-A blocks, per suggestions in L2/22-056. For Unicode 15.0.
AI Rick McGowan. Respond to the author of L2/22-056, notifying him of the responses from the Editorial Committee in L2/22-065.
F. Responses to Other Public Feedback
F1. Public Feedback Noted in L2/22-063
FYI: This review refers to items in L2/22-063 listed under "Feedback routed to Editorial Committee for evaluation".
Date/Time: Mon Jan 17 01:46:41 CST 2022
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Core Specification
Section 24.1 of the Core Specification, Character Names List, describes the Dashed Box Convention: "DashedBoxConvention. There are a number of characters in the Unicode Standard which in normal text rendering have no visible display, or whose only effect is to modify the display of other characters in proximity to them." Since Unicode 6.0, the dashed box convention has also been applied to characters with Indic syllabic category Consonant_Preceding_Repha. Such characters are always rendered visibly; the dashed box is used to indicate that they require reordering to after the following base character.Discussion: The Editorial Committee reviewed this feedback, and has already discussed an update that is implemented in the current draft of the text for the 15.0 Core Specification. No action needs to be recorded.
Date/Time: Thu Jan 27 15:49:01 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UAX #24 and UAX #31
UAX #24 contains the mistakes “GREEK LETTER SMALL LETTER OMICRON” (instead of “GREEK SMALL LETTER OMICRON”) and “in provided in” (instead of “is provided in”). A period is missing after “can be classified by script”. UAX #31 contains “an definition” (instead of “a definition”) and possibly some misplaced spaces (search for “ ,” and “ .”).Discussion: The Editorial Committee reviewed this feedback, and agrees that these typos should be fixed. The corrections for UAX #24 have already been made, and are in the current proposed update for that specification. The corrections for UAX #31 have also already been made, and are in the proposed update for that specification. So no action items need to be recorded.
Date/Time: Tue Feb 1 01:51:59 CST 2022
Name: Vikki McDonough
Report Type: Error Report
Opt Subject: Unicode 14.0 "Optical Character Recognition" code chart
In the code chart for the Optical Character Recognition block, the reference glyph for character U+2447, OCR AMOUNT OF CHECK, is misshapen. The vertical bar in the middle of the glyph should be centered vertically; if we take the lower-left rectangle as glyph-component A, the vertical bar as glyph-component B, and the upper-right rectangle as glyph-component C, and designate the height of the upper and lower edges of each component as hU( [A/B/C]) and hL([A/B/C]), respectively, {hU(A)-hL(B)} should equal {hU (B)-hL(C)}. However, in the reference glyph for this character in the official Optical Character Recognition code chart, the vertical bar is too high up, and {hU(B)-hL(C)} is much greater than {hU(A)-hL(B)}. This error has been present since at least Unicode 3.0 (the earliest Unicode version for which an archived copy of the Optical Character Recognition code chart is retrievable from the Wayback Machine). Code chart containing the error: https://www.unicode.org/charts/PDF/U2440.pdf ("Optical Character Recognition; Range: 2440–245F") Archived Unicode 3.0 code chart demonstrating a lower bound on the length of time this error has been present: https://web.archive.org/web/20010603000706/http://www.unicode.org/charts/PDF/U2440.pdf Example of an E-13B-based font showing the correct form of this glyph: https://commons.wikimedia.org/wiki/File:MICR_char.svg (high-resolution version: https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MICR_char.svg/2560px-MICR_char.svg.png)Discussion: The Editorial Committee reviewed this feedback, and referred it to the code charts editor, Michel Suignard. Although the difference in the glyph design here is minuscule, Michel has gone ahead and implemented the correction. It is live in the current alpha review code charts. So no further action needs to be recorded.
Date/Time: Sat Feb 12 13:44:53 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTR #17
I suggest the following corrections in UTR #17: "O'Reilley" → "O'Reilly" "graphic character glyphic identifier" → "graphic character global identifier" "Graphic Character Set Glyphic Identifier" → "Graphic Character Global Identifier" "UTF32-LE" → "UTF-32LE" "an single" → "a single" "sets, where for example," → "sets where, for example," "UTF-16 ," → "UTF-16," "(“character set” )" → "(“character set”)" "3.0,..." → "3.0, ..." "CCS's" → "CCSes" "UDC's" → "UDCs" "UAX# 29" → "UAX #29" "Compression. [BOCU]." → "Compression [BOCU]."Discussion: The Editorial Committee reviewed this feedback, and agrees that all of these typos should be fixed. They have all been fixed in a draft for a proposed update to UTR #17 that is still undergoing editorial review and discussion about some further content updates.
Suggested associated action item:
AI Ken Whistler, EDC. Prepare a proposed update for UTR #17, with fixes for typos noted by Ivan Panchenko in L2/22-063 [Sat Feb 12 13:44:53 CST 2022].
AI Rick McGowan. Post the proposed update for UTR #17, to close July 11, 2022.
Date/Time: Sun Feb 13 10:55:14 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTR #23
UTR #23 contains the following minor mistakes: “An code” (instead of “A code”), “comparsion” (instead of “comparison”), “applies as” (instead of “apply as”), “an encoded characters” (instead of “an encoded character”), “properties the” (instead of “properties of the”), “Unicode Character database” (instead of “Unicode Character Database”), “For example 'Character Property', becomes” (instead of “For example, 'Character Property' becomes”). A space is missing in “results.Proceeding”. I also suggest changing “values, (other than the default value)” to “values (other than the default value),”. The comma here can be deleted: “accessed,”, “a property, with”, “way, is”, “input, is”, “for, is”.Discussion: The Editorial Committee reviewed this feedback, and agrees that these typos should be fixed. No proposed update is currently active for this specification, so a new one should be prepared, with these fixes and any further content fixes which may be appropriate.
Suggested associated action item:
AI Ken Whistler, EDC. Prepare a proposed update for UTR #23, with fixes for typos noted by Ivan Panchenko in L2/22-063 [Sun Feb 13 10:55:14 CST 2022].
AI Rick McGowan. Post the proposed update for UTR #23, to close July 11, 2022.
Date/Time: Thu Mar 3 20:55:52 CST 2022
Name: David Corbett
Report Type: Error Report
Opt Subject: Chapter 9
The glyphs for positional forms of U+0886 ARABIC LETTER THIN YEH in chapter 9 look identical to those for U+064A ARABIC LETTER YEH. They should be thin.Discussion: The Editorial Committee reviewed this feedback, and agrees that some glyph updates might be appropriate.
Suggested associated action item:
AI Liang Hai, EDC. Update the glyph design for U+0886 ARABIC LETTER THIN YEH in Tables 9-8 and 9-10 of the Core Specification (Version 14.0), and prepare fixes. For Unicode 15.0. Ref. David Corbett in L2/22-063 [Thu Mar 3 20:55:52 CST 2022].
Date/Time: Thu Mar 17 19:31:23 CDT 2022
Name: Martin J. Dürst
Report Type: Error Report
Opt Subject: Unicode version 14.0.0, section 5.4
This is not really an error, but a place where language could be improved. Section 5.4 of Unicode 14.0.0 (https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf) contains the following: ``` Because the ranges are disjoint, each code unit in well-formed UTF-16 must meet one of only three possible conditions: • A single non-surrogate code unit, representing a code point between 0 and D7FF16 or between E00016 and FFFF16 • A leading surrogate, representing the first part of a surrogate pair • A trailing surrogate, representing the second part of a surrogate pair ``` The wording here is a bit strange. "Condition" seems to require "It is ..." in each of the bulleted items. Either add "It is " to each bullet, or change the preceding text to say "it is one of the following three".Discussion: The Editorial Committee reviewed this feedback, and agreed that the wording should be improved in the text.
Suggested associated action item:
AI Ken Whistler, EDC. Improve the wording of re well-formed UTF-16 in Section 5.4 of the Core Specification. For Unicode 15.0. Ref. Martin Dürst in L2/22-063 [Thu Mar 17 19:31:23 CDT 2022]
Date/Time: Fri Mar 18 20:28:10 CDT 2022
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: 442
On the codecharts for Cyrillic Extended-D some of the characters use the Greek letterforms (of Delta and Phi respectively) rather than the Cyrillic ones (of be and ef respectively). These are: 1E031, 1E042, 1E052 & 1E060. The latter two are just the subscript version of the former two, with the same issue.Discussion: The Editorial Committee reviewed this feedback, and agreed that the glyphs should be improved.
Suggested associated action item:
AI Debbie Anderson, EDC. Work with Kirk Miller to fix the glyphs for 1E031/1E042, 1E052/1E060, 1E050, 1E06B, to be more consistent. Provide updated glyphs to Michel Suignard. For Unicode 15.0. Ref. Eduardo Marín Silva in L2/22-063 [Fri Mar 18 20:28:10 CDT 2022]
Date/Time: Sun Apr 10 08:59:51 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Armenian left half ring
Section 7.6 “Armenian” says “There is no left half ring in Armenian. Unicode character U+0559 is not used. It appears that this character is a duplicate character, which was encoded to represent U+02BB MODIFIER LETTER TURNED COMMA, used in Armenian transliteration. U+02BB is preferred for this purpose.” Via https://en.wiktionary.org/wiki/%D5%99 I found http://www.nayiri.com/imagedBook.jsp?id=1&printPage=10 which shows a left half ring (or turned apostrophe) being used in the Armenian script in a book on Armenian dialects. Should this character be encoded as U+0559 or as U+02BB? The standard should explain which to use in the Armenian script, because the standard is currently wrong or at least misleading.Discussion: This feedback was posted too late for Editorial Committee review. It should be routed to the Script Ad Hoc group for initial consideration, because it is an issue of what characters are actually in use in Armenian.
Suggested associated action item:
AI Rick McGowan. Forward feedback from David Corbett in L2/22-063 [Sun Apr 10 08:59:51 CDT 2022] to the SAH for investigation.
Date/Time: Mon Apr 11 17:49:00 CDT 2022
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: IndicSyllabicCategory.txt
The file IndicSyllabicCategory.txt has a category Brahmi_Joining_Number, which contains only the Brahmi numbers U+11052..U+11065. The documentation for that category in the same file says "similar to Number in that in can be used as vowel-holders like Consonant_Placeholder, but may also be joined by a Number_Joiner of the same script, e.g. in Brahmi". This contradicts the core specification, section 14.1, which says "the numerals U+11052 brahmi number one through U+11065 brahmi number one thousand and their ligatures formed with U+1107F brahmi number joiner are not used as vowel carriers".Discussion: This feedback should be routed to the Script Ad Hoc for further investigation.
Suggested associated action item:
AI Rick McGowan. Forward feedback from Norbert Lindenberg in L2/22-063 [Mon Apr 11 17:49:00 CDT 2022] to the SAH for investigation.
G. Miscellaneous Topics
G1. (None noted)