L2/24-163
Source: Editorial Working Group
Date: July 16, 2024
FYI: The Editorial Working Group is continuing review of new content planned for the eventual 16.0 publication of the core specification. In particular, our contributing editors are continuing their review and editing of the following sections, numbered as they will appear in the revised new 16.0 text:
Section 9.2 (Arabic)
Section 11.1 (Sumero-Akkadian)
Section 11.4 (Egyptian Hieroglyphs)
Section 15.3 (Sora Sompeng)
Section 16.2 (Lao)
Section 17.3 (Balinese)
Section 18.1 (Han)
There is also ongoing work to do routine upkeep of the core specification and to stay current with bug reports and other small tweaks to core specification content mandated by the UTC.
In general, the Editorial Working Group can assert that we should not have any trouble completing new content for the core specification to cover the current anticipated repertoire for 16.0. The essential challenge for the Editorial Working Group for 16.0 is not the new content related to newly assigned repertoire, but rather the overall change in the planned publication format for the 16.0 core specification. (See below.)
We don’t foresee problems that prevent the 16.0 release in September.
The beta review was deployed smoothly at the eventual location for 16.0, https://unicode.org/versions/Unicode16.0.0/core-spec/, and we have been occasionally deploying revisions. The repo’s public.yml workflow keeps a record of public deployments.
We’ve decided to use PDF/A-2u for the single archival PDF.
Notable ongoing tasks for 16.0:
Character references (eg, U+0021 EXCLAMATION MARK):
They will be “componentized” so the relationship between the code point and the character name is always validated against the UCD.
The standard representative glyphs from the code charts will also be used.
Figures:
Broken figures will be reconstructed.
Figure images will be scaled to restore their sizing in 15.0.
We’re trying to migrate the text font from Adobe’s commercial product Minion 3 to an open source option, likely Source Serif.
The unversioned web bookmarks page (https://unicode.org/versions/latest/bookmarks.html) and those links into PDF files using “global IDs” will be redirected to the per-chapter web pages.
The Editorial Working Group has conducted a review of certain pages for the 16.0 website, most notably the landing pages and the beta review pages. Certain minor edits have been noted; for instance, the removal of the navigation bar to ease the reading of the pages on mobile devices.
We have noted the many problems impacting Unicode's homepage (https://home.unicode.org/) and we are looking forward to improvements through the decommission of the WordPress website and the subsequent use of a new framework.
FYI: The Editorial Working Group continues to provide general maintenance of pages on the Unicode technical website.
FYI: The Editorial Working Group continues to meet regularly. Our meetings are generally held on a biweekly schedule, except when holidays or other events coincidence, such as UTC meetings. This report to the UTC includes feedback from the Editorial Working Group meetings held on May 9, 2024, May 23, 2024, June 6, 2024, June 20, 2024, and July 11, 2024.
FYI: Public-facing information about the Editorial Working Group and its work is maintained on the Unicode Editorial Working Group Page on the website. The Editorial Working Group also maintains an internal subsite for use by the committee. People who would like to find out more about the work of the Editorial Working Group or contribute to that work should contact the Chair, Louka Ménard Blondin (louka@unicode.org).
For some time, the Editorial Working Group has hosted additional TUS Futures meetings on a biweekly basis, between the weeks of the general meetings. We are currently working towards merging the TUS Futures concern with the general Editorial Working Group concern.
The Editorial Working Group is in ongoing need of volunteer editors with copyediting experience. People who are interested in learning more about this work and potentially take it up should contact the Chair for more information.
Work is ongoing on improving the public documentation about the Editorial Working Group for potentially interested contributors both inside and outside of Unicode. We eventually plan to document and chart the internal processes of the committee to help newcomers better understand our work.
FYI: We have been lightly reviewing the periodic updates to Draft UAX #53.
Date/Time: Mon Apr 29 08:45:22 CDT 2024 ReportID: ID20240429084522 Name: Wuzzy Wuzzard Report Type: Error Report Opt Subject: Core Specification 15.0
xxxxxxxxxx
I think I found an error in the Core Specification 15.0. Link: https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf On page 51 (real page number, not PDF page), it says: Plane 15 and Plane 16 are allocated, in their entirety, for private use. Those two planes con- tain a total of 131,068 characters, to supplement the 6,400 private-use characters located in the BMP. All other planes are reserved; there are no characters assigned in them. The last two code positions of all planes are permanently set aside as noncharacters. (See Section 2.13, Special Characters). This seems to be a contradiction. If Planes 15 and 16 are entirely for private use, then the last two code positions cannot be noncharacters. If the last two code positions are noncharacters, then Planes 15 and 16 cannot be entirely for private use. The reason why I think this is a contradiction because of another section. Here is an explanation of "noncharacter" on page 938: In effect, noncharacters can be thought of as application-internal private-use code points. Unlike the private-use characters discussed in Section 23.5, Private-Use Characters, which are assigned characters and which are intended for use in open interchange, subject to interpretation by private agreement, noncharacters are permanently reserved (unassigned) and have no interpretation whatsoever outside of their possible application-internal pri- vate uses. If this definition is what was actually intended by the spec, I conclude that all codepoints it defines as "noncharacters" can never be considered "for private use" at the same time because "for private use" implies interchangability, while "noncharacter" means "not for interchange"/"internal-only". Therefore, the statement "Plane 15 and Plane 16 are allocated, in their entirety, for private use." must be false. The word "entirety" is the problem here.
Comment: This inconsistency has been noted and has been corrected in the Unicode 16.0 draft.
Date/Time: Wed Apr 24 12:13:30 CDT 2024 ReportID: ID20240424121330 Name: Ned Holbrook Report Type: Error Report Opt Subject: Unicode 16.0 Core Spec [EDC]
xxxxxxxxxx
Table 12-38 is missing a couple of space characters, namely in “0D310D31” and “0D2A0D31”. I would also note in passing that it is somewhat jarring to note just how many ways there are of formatting sequences in this chapter: Table 12-32 lists code points separated by commas in angle brackets, Table 12-35 lists code points separated by commas with no angle brackets, Table 12-37 lists code points interspersed with descriptions, Tables 12-38 and 12-39 list code points separated by spaces, and Table 12-40 has parallel lists of descriptions and code points separated by commas. While I would not assume a single format is best for every purpose, it does seem that there could be more consistency in this chapter at least.
Comment: Table 12-38's missing spaces have already been fixed in the Unicode 16.0 draft; the remainder are style issues pertaining to a future milestone.
Date/Time: Tue May 07 18:02:20 CDT 2024 ReportID: ID20240507180220 Name: Markus Scherer Report Type: Error Report Opt Subject: TUS table 4-5 Primary Numeric Ideographs [EDC]
xxxxxxxxxx
Eric Muller noticed that TUS table 4-5 shows U+5146 with the value 1,000,000,000,000 (10,000 × 10,000 × 10,000) which since Unicode 15.1 is no longer the Numeric_Value of that code point. See
https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/#G138783 https://www.unicode.org/Public/15.0.0/ucd/extracted/DerivedNumericValues.txt https://www.unicode.org/Public/15.1.0/ucd/extracted/DerivedNumericValues.txt
The kPrimaryNumeric value is 1000000 1000000000000 (with two values separated by a space). The first one of these is the Numeric_Value. Also, kPrimaryNumeric has data for 20 code points, while table 4-5 shows only 17.
Action item(s):
Ken Whistler, EDC: Investigate and correct tables 4-5 [Section E2 of L2/24-163].
Date/Time: Sat Jun 01 11:53:22 CDT 2024 ReportID: ID20240601115322 Name: Sridatta A Report Type: Public Review Issue Opt Subject: 502 [EDC]
xxxxxxxxxx
Updating the Tirhuta chapter of Core Specification. “ and in the Narayani and Janakpur zones of Nepal. ” Nepal currently doesn’t use Zones for administrative divisions since 2015. According to the current classification, Maithili is majorly spoken in Madhesh and Koshi provinces. https://en.m.wikipedia.org/wiki/Maithili_language
Comment: This has already been corrected in the Unicode 16.0 draft.
Date/Time: Sat Jun 08 19:25:17 CDT 2024 ReportID: ID20240608192517 Name: Jules Bertholet Report Type: Public Review Issue Opt Subject: 502 [EDC]
xxxxxxxxxx
From §5.8.2 of the core spec https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G21129: This is a paragraph with a line separator at this point, causing the word “causing” to appear on a different line, but not causing the typical paragraph indentation, sentence breaking, line spacing, or change in flush (right, center, or left paragraphs). However, the paragraph in question actually uses a paragraph separator, not a line separator. </p> should be replaced with </br> in the HTML.
Comment: This has already been corrected in the Unicode 16.0 draft.
Date/Time: Wed Jun 26 12:31:03 CDT 2024 ReportID: ID20240626123103 Name: Charlotte Buff Report Type: Public Review Issue Opt Subject: 502 [EDC]
xxxxxxxxxx
I propose adding cross references between U+2BFA ⯺ UNITED SYMBOL and U+1CC88 TWO RINGS ALIGNED HORIZONTALLY because of their similar appearance.
Comment: This has already been fixed in the names list draft for 16.0.
Date/Time: Wed Jun 26 13:09:26 CDT 2024 ReportID: ID20240626130926 Name: Peter Constable Report Type: Public Review Issue Opt Subject: 489 [EDC]
xxxxxxxxxx
In the code chart for Garay (https://www.unicode.org/charts/PDF/Unicode-16.0/U160-10D40.pdf), the names list has a subhead "Punctuation and reduplication mark" immediately before U+10D6D GARAY CONSONANT NASALIZATION MARK. That character would fit better within the scope of the preceding subhead, "Marks". Proposed change: move the "Punctuation..." subhead after U+10D6D.
Comment: This has already been fixed in the names list draft for 16.0.
Date/Time: Tue Jul 02 15:32:12 CDT 2024 ReportID: ID20240702153212 Name: Karl Pentzlin Report Type: Public Review Issue Opt Subject: 502 Unicode 16.0.0 Beta [EDC/Charts]
xxxxxxxxxx
On a discussion of some symbol characters (L2/23-152) at the ongoing SC2/WG2 meeting in Prague, there were some misunderstandings, as looking at the Unicode code tables only, it was not obvious which characters in fact are Emoji. Thus, it seems advisable to get an easily accessible information in the code chart, whether — a character is "emoji by default", i.e. listed in emoji-sequences.txt as Basic_Emoji, but without FE0F in the first column, — or a character is "selectable as emoji" by the variation selector U+FE0F, i.e. listed in emoji-sequences.txt as Basic_Emoji, together with FE0F in the first column. I had mailed this to Asmus Freytag as the author of the Unibook software. In his answer, he recommended me to outline the problem in a response to the Unicode 16.0 beta review (however, I will not hurry anyone to discuss this issue before Unicode 17). As he wrote, this would focus on the use case of not being able to tell something that so fundamentally affects the identity of a character from looking at the code charts. Particularly, as for emoji, the representative glyph in the code chart lacks the relevance that it has for other characters and may, in fact be misleading. It can be noted, that the code charts already indicate those characters, for which there is a standardized variant, and for which, therefore, the sole representative glyph may not be giving the full information.
Comment: This will be accommodated in the Unicode chart production process.
Action item(s):
Ken Whistler, EDC: Insert a new section 24.1.11 Emoji Variation Sequences with the following content [Section E2 of L2/24-163]:
xxxxxxxxxx
Characters with the Emoji property consistently have two variation sequences, with one requesting the glyph to be in text presentation and the other for the glyph to be in emoji presentation. The glyphs for emoji presentation variation sequences cannot be displayed by the font technology used to print the code charts. Instead, a representative text presentation is shown throughout. The variation sequences are not listed in the names list, but in the code charts, characters with the Emoji property are indicated with a small white triangle in the top left corner. Characters that have the Emoji_Presentation property and therefore would default to that presentation are indicated with a small white triangle.
(SAMPLE image for 231A)
Some Emoji characters also have other variation sequences defined, as in the following example.
(SAMPLE image for 0030)
Representative glyphs for both the colorful emoji presentation style and the text style of all emoji variation sequences can be found in the emoji charts section of the Unicode website.
Remove related text from 24.1.10.
Date/Time: Tue May 21 18:27:20 CDT 2024 ReportID: ID20240521182720 Name: Erik Carvalhal Miller Report Type: Public Review Issue Opt Subject: 502
xxxxxxxxxx
Chapter 22, §22.7.4 [https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-22/#G78435], ¶5 (“A set of ASCII digits 0 through 9…”): “ASCI” → “ASCII”, “charcter” → “character” (in “Outlined uppercase Latin letters and ASCI digits from the European charcter set for the Sharp MZ-series machines…”
Comment: This has already been fixed in a subsequent draft of the core spec.
G1. We have received L2/24-169 (Suggested improvements to the code chart annotations of all Latin blocks)
G2. We have discussed L2/24-180 (Proposal to refer to UTN #57 for implementing the Mongolian script)
Action item for Liang Hai, EDC: Update the Mongolian section of the core spec of 16.0 to include a reference to UTN #57 and adjust text accordingly [L2/24-180].
Action item for Ken Whistler, EDC: Update the names list to include a reference to UTN #57 in the Mongolian block.