L2/21-070
Editorial Committee Report and Recommendations for UTC #167 Meeting
Source: Editorial Commitee
Date: April 26, 2021
A. Unicode Release Topics
A1. Unicode 14.0 Schedule and Planning
FYI: The significant milestones for the Unicode 14.0 release are:
- Alpha start: February 9, 2021
- Alpha close: April 12, 2021
- Beta start: June 4, 2021
- Beta close: July 20, 2021
- Release: September 14, 2021
The planned beta review and release dates are unchanged from those reported in the Editorial Committee Report and Recommendations for UTC #166 Meeting. The alpha review is now complete, with the close date moved to April 12, to match the close date for other PRIs for discussion at UTC #167.
Once the UTC has made all its decisions based on the alpha review feedback, the Editorial Committee plans to start coordination of the beta review preparation.
A2. Alpha Review for 14.0
FYI: Alpha Review for 14.0 closed on April 12. The review produced a significant amount of feedback, as noted below and in the reports from the other groups that considered the feedback. The good news is that we got a lot of feedback. The bad news is that we got a lot of feedback.
The upside is that quite a few small errors were noted, many of which have already been addressed in the data files or in other documents and drafts. This means that the quality of the beta review should be better than it otherwise might have been, with fewer errors to note and fix, particularly in the data files.
The problem, however, is that despite our attempts to focus folks on review of the actual repertoire proposed for encoding, and the names and glyphs in the alpha charts, many of the reviewers proceeded to treat the alpha review as if it were the beta review, and raised many questions about the details of the data files, many of which were not fully ready for prime time when the alpha review started. This resulted in a significant additional chunk of early work for the folks responsible for preparing data files for the UCD, as more iterations and adjustments have been necessary, prior to beta review, when that work used to be started.
The voluminous feedback has also resulted in more churning of names list annotations than is usual for a release.
Altogether, the alpha review turned into way more work than was initially advertised, and if we are going to continue doing both an alpha review and a beta review cycle for all future releases, then effectively the alpha review has to be planned (and staffed) in more detail, unless the UTC were willing to bin off-topic feedback during the alpha review period. But in our opinion, trying to train public reviewers to limit their feedback to certain topics is hopeless.
EC-UTC167-R1: The Editorial Committee recommends that
The UTC closes PRI #428, Unicode 14.0.0 Alpha Review.Suggested associated action items:
AI Rick McGowan. Close PRI #428.
AI Ken Whistler. Update status notice on the Pipeline page.
A3. Beta Review for 14.0
FYI: We now proceed to beta review for Unicode 14.0. Presuming that all the pertinent technical decisions regarding approved repertoire have been recorded, including any name and code point changes, the UTC should go ahead and authorize the start of the 14.0 beta review, according to the schedule noted above.
EC-UTC167-R2: The Editorial Committee recommends that
The UTC authorizes a PRI for a Beta review period for the Unicode 14.0 repertoire, to start June 4, 2021. To close July 20, 2021.Suggested associated action items:
AI Rick McGowan. Post a PRI for the Unicode 14.0 beta review, to close July 20, 2021.
AI Ken Whistler, Editorial Committee. Execute the beta review plan for 14.0.
Note that rather than recording a bunch of individual action items regarding the beta review, as we had to do for the alpha review last January, this can be kept to a single tracking action, as there already exists a detailed plan for beta review (the "Big Red Switch"), used by the Editorial Committee to handle this.
A4. Unicode 14.0 Core Specification and Other Editing
FYI: Meeting virtually like most everyone else, the editorial committee is continuing its work on Version 14.0 of the Unicode Standard, due for release in September. We'll be finalizing the text of the core specification in June and July. We are also continuing our work editing technical reports. We'll be updating the summary information on the Version 14.0 web page in preparation for the beta.
B. Website Topics
B1. Website Status
FYI: The technical website has been stable since the last UTC meeting, with no access problems and few reports of issues with content on particular pages. The Editorial Committee has participated in minor maintenance on a few pages, including update of some FAQ pages and ongoing minor re-templating of some pages. In particular, there was significant work done on the pages listing Unicode Consortium officers, to reflect changes in personnel and organizational structure coming out of the latest BOD meetings. See:
Chairs for Unicode Technical Committees and Subcommittees
B2. Website Content Maintenance
FYI: The Editorial Committee plans to work on a complete analysis of the technical website content, so that content ownership can be rationalized and a more systematic approach to ongoing maintenance can eventually be developed. There is nothing significant to report on this project right now. The Editorial Committee focus has been on ongoing work related to the 14.0 release, and there have been no cycles available for key participants to delve into the website maintenance planning.
C. Editorial Committee Process Issues
FYI: The Editorial Committee continues to meet approximately once a month via Zoom, with those monthly meetings now scheduled for 5 hours (with a lunch break), instead of the longer meetings we used to hold. A certain level of Zoom fatigue has set in among everyone, and the efficiency of the meetings has been dropping a bit, as all involved tend to be swamped with more and more virtual meetings. Given the growth of the organization, there is little likelihood that the number of meetings will decrease in the future, even after COVID-19 restrictions are relaxed again.
Part of the immediate problem for the Editorial Committee is that a number of the veteran editors on the committee are also increasingly involved in other aspects of the organization, including PR, governance, and infrastructure issues. This has significantly diluted the attention to the core editorial issues attended to by the Editorial Committee.
D. UTR Topics
FYI: The Editorial Committee has no new suggestions to bring up separately about the content of various UTRs at this time. Feedback on documents open for public review is covered below.
E. PRI Topics
E1. Overall Disposition of Open PRIs
To ensure that the UTC records explicit actions for all of the currently open PRIs, we have pulled together an omnibus recommendation for progression of each PRI, except PRI #428 (alpha review -- see above) and PRI #408 (QID).
EC-UTC167-R3: The Editorial Committee recommends that
The UTC extends the close dates for the following open PRIs to July 20, 2021:
- UAXes: PRI #416 (14), PRI #417 (29), PRI #419 (44), PRI #420 (45), PRI #421 (38), PRI #422 (9), PRI #424 (31)
- UTSes: PRI #423 (39), PRI #425 (10), PRI #427 (18)
- UTRs: PRI #415 (23), PRI #426 (53)
Note that UTS #18 is not part of the Unicode 14.0 release, but it is separately recommended to extend the close date of the PRI for its proposed update. See recommendation PRI427b in L2/21-069. UTR #23 and UTR #53 are also not part of the Unicode 14.0 release, but there is no urgency to close the PRI for UTR #23 and publish that specification now. For UTR #53, because of the nature of the change in the document, that specification must wait until the release of Unicode 14.0 for publication, so its PRI should also just be extended now.
Suggested associated action item:
AI Rick McGowan. Extend the close dates for PRIs #416, #417, #419, #420, #421, #422, #424; #423, #425, #427; #415, #426. To close July 20, 2021.
E2. Editorial Feedback on PRI #417 for UAX #29
FYI: The following items are extracted from the feedback to PRI #417 for discussion and disposition by the Editorial Committee. The Editorial Committee considers the other feedback received on PRI #417 to cover technical issues that should be dealt with by the Properties & Algorithms Group, rather than the Editorial Committee.
Date/Time: Mon Mar 22 18:43:49 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: A small editorial issue on UAX #29
On the Comments column on the row second from the bottom (for "kʷ") in Table 1a, The annex says "sequence with letter modifier", though I believe the Unicode Standard uses a term "modifier letter" but "letter modifier" to describe a character like "ʷ". It should be changed to read "sequence with modifier letter" for less confusion.Discussion: The Editorial Committee considered this to be a good change. Chris Chapman has already implemented the change in the 4/15/2021 draft of the proposed update for UAX #29, so no action item needs to be recorded.
Date/Time: Sun Mar 28 06:31:11 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: UAX #29 contains a strange statement as an explanation
The second line in Section 4 (Word Boundaries) currently reads: The most familiar ones are selection (double-click mouse selection or “move to next word” control-arrow keys) and the dialog option “Whole Word Search” for search and replace. It implies that '"move to next word" control-arrow keys' is a "selection", but I believe it is contrary to the common function; control-arrow key usually instructs a movement of the cursor without selection, and if you want to select to next word, you need to press control-shift-arrow keys. Probably we should either change '"move to next word" control-arrow keys' to '"select to next word" control-shift-arrow keys" or change the nearby phrases to something like '... selection (double-click mouse selection), cursor movement ("move to next word" control-arrow keys), and the dialog ...' I hope this feedback helps.Discussion: The Editorial Committee considered this to be a good recommendation. Chris Chapman has already implemented the change in the 4/15/2021 draft of the proposed update for UAX #29, so no action item needs to be recorded. The text improvement suggested by the Editorial Committee was:
The most familiar ones are selection (double-click mouse selection), cursor movement (“move to next word” control-arrow keys), and the dialog option “Whole Word Search” for search and replace.
Date/Time: Sun Apr 11 18:04:14 CDT 2021
Name: Masahiro Sekiguchi
Report Type: Error Report
Opt Subject: Inappropriate description in UAX #29
The 3rd paragraph of "7 Testing" in UAX #29 "Unicode Text Segmentation" explains the format of the three auxiliary files (referred to as [Charts29]), and I believe the current description is different from the actual auxiliary files. It says "The header cells of the chart consist of a property value, followed by a representative code point number.", but no "representative code point number" follows the property name on the actual chart. It also says " hovering the mouse over the code point number will show the character name, General_Category, Line_Break, and Script property values.", but the character name etc. are shown when hovering over property values but code point numbers (perhaps because there are no code point numbers). Either the description of the charts in UAX #29 or the charts themselves should be corrected to make them consistent.Discussion: The Editorial Committee agreed that this description no longer accurately reflects the actual format of the charts. Chris Chapman has updated the paragraph in the 4/15/2021 draft of the proposed update for UAX #29 to properly reflect what is shown in the chart header and first column row, and what is shown in tooltips, so no action item needs to be recorded.
E3. Editorial Feedback on other open PRIs for documents
FYI: The Editorial Committee has no new feedback on other open PRIs for documents at this time.
E4. Editorial Feedback on PRI #428 for Unicode 14.0.0 Alpha Review
FYI: The following items are extracted from the feedback received for PRI #428. Items which have already been addressed (with dispositions noted in red in the feedback page for PRI #428) are not included. Items which cover technical and data issues in the purview of the Properties & Algorithms Group are not listed here; only items which seem appropriate for resolution by the Editorial Committee are listed.
Date/Time: Mon Feb 15 19:56:28 CST 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Suggestions on the alpha code chart of Diacritical Marks Extended
1. Whenever a header says "Used in..." It should read instead "Marks for..." 2. The header above 1AC1 should say (after the current header) "... Do not use pairs of these marks as replacement for 1ABB or 1ABD" 3. The two marks "combining double plus above and below" should be moved up, to be next to the single "plus sign above" and the Ormulum marks shifted down two spots. 4. The bullet note above the "number sign above" currently reads "used extensively in J.P. Harrington’s transcriptional notation" I suggest for it to read "Used by J.P. Harrington to indicate heavy or contrastive stress" 5. The "combining triple acute accent" should have a mutual cross reference to the "combining double acute accent"Discussion: Item 3 is a technical change that is outside the remit of the Editorial Committee. See related discussion about this code point move in L2/21-069 and in L2/21-073
The Editorial Committee suggests the other items be remanded to the names list editor for appropriate changes.
Suggested associated action item:
AI Ken Whistler. Consider the feedback from Eduardo Marín Silva (Feb 15) on PRI #428 for appropriate changes to the names list for Unicode 14.0.
Date/Time: Sun Feb 14 10:01:09 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Defective glyph for U+1FAE2
The code chart glyph for proposed character U+1FAE2 FACE WITH OPEN EYES AND HAND OVER MOUTH is inverted, showing a solidly filled face instead of an outline drawing like the other faces.Date/Time: Fri Feb 26 15:42:43 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Sharada Code Chart
In the character list on Page 3, SHARADA VOWEL SIGN VOCALIC LL and SHARADA VOWEL SIGN E are overlapping. This needs to be fixed.Date/Time: Fri Feb 26 15:56:05 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Telugu Nukta Glyph in the Code Chart
As per L2/20-085, Telugu Nukta should have the combining circle below as its representative glyph to avoid confusion with the aspirate marker. (If the current shape will be retained) The annotation "can also appear as a large dot" is moot. The glyph is already a dot. VDiscussion: All three of these glyph changes have already been noted by the code charts editor, who has made appropriate fixes for 14.0.
Date/Time: Mon Mar 1 16:47:35 CST 2021
Name: Erik Carvalhal Miller
Report Type: Public Review Issue
Opt Subject: PRI #428: Comment for U+02B9
The first comment for U+02B9 MODIFIER LETTER PRIME in block Spacing Modifier Letters (unchanged in the 14.0 alpha) says, “primary stress, emphasis”; I recommend either removing the word “primary” or else inserting the phrase “secondary stress”, to better reflect the broad, varied use of the character in marking stress, as the current wording is misleadingly specific. Background & reference: U+02B9ʼs use for primary stress in some dictionaries is undisputed, but L2/20-286 shows excerpts from historical and contemporary dictionaries in which phonetic spellings employ U+02B9 for secondary stress as well. (As reported in L2/21-016 §I.3o, the UTC rejected L2/20-286ʼs proposal to separately encode a prime‐symbol variant that represents primary stress in those excerpts, but the rejection does not impinge on the secondary‐stress use in evidence.)Discussion:The editors discussed this, and agreed that removal of the word "primary" in the annotation could make it less confusing. The change has already been rolled into the NamesList.txt file for 14.0.
Date/Time: Wed Mar 31 15:54:10 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Final round of revision to the codechart anottations, but the second half correspond to the pictograms
The first half corresponds to annotations that I missed the first two rounds, but the second half corresponds to the pictograms. Arabic: 06C5 ARABIC LETTER KIRGHIZ OE: On the second bullet note,instead of reading "a barred form also occurs", it would be better if it read "a glyph variant replaces the looped tail with a horizontal bar through the tail" Arabic Extended-B: 088E ARABIC VERTICAL TAIL: The header above this character should read "Abbreviation mark" instead of "Abbreviation letter" A better phrasing of the bullet note below would be "mark used to indicate abbreviations in moveable type texts from Iran" followed by another note saying: "considered a letter; only attested in final form" Glagolitic: 2C2F GLAGOILITIC LETTER CAUDATE CHRIVI: The bullet note cites the characters it can combine with, but the glyphs with the dotted circle are missing. Furthermore, informative aliases should be added "= cherv, chrivi with tail" Arabic Presentation Forms-A: FDCF ARABIC LIGATURE SALAAMUHU ALAYNAA: Another bullet note could be added stating "used in Christian texts" Kana Extended-B: The initial note states that the system in question is "obsolete", which seems to imply that it was replaced by another system, and it also states that it was used in Taiwan; which is true, but it was also used in a nearby region of mainland China. Ethiopic Supplement: Given the new information of the legacy Gurage orthography the header above 1380 that reads "Syllables for Sebatbeit" should read "Legacy syllables for Gurage orthographies" Followed by a note under this header saying "These characters were originally encoded to represent the Sebatbeit language, but their use extended beyond that language to an entire linguistic region called 'Gurage'; therefore the term 'Sebatbeit' inserted in the character names, should not be interpreted as exclusionary to other languages, but a mere historical artifact. The orthography for the Gurage languages has been updated to use new syllables and these are encoded in the 'Ethiopic Extended-B' block." It's unclear if the header above 2DC0 (in the Ethiopic Extended block) should also be modified accordingly, but the block descriptions in the Spec, should be updated accordingly. Transport and Map Symbols: 1F6DE WHEEL: The informative alias "= tire" could be added 1F6DF LIFE BUOY: The informative alias "= life saver" could be added Geometric Shapes Extended: 1F7F0 BOLD EQUALS SIGN: The addition of this symbol in this block (as opposed to Symbols and Pictographs Extended-A) is dubious. Symbols and Pictographs Extended-A: 1FA74 THONG SANDAL: These informative aliases "= flip flop, chancla" could be added 1FA78 DROP OF BLOOD: Mutual cross references to "1F4A7 💧 droplet" and "1F322 🌢 black droplet" could be added 1FA79 ADHESIVE BANDAGE: The informative alias "= band aid" could be added. 1FA85 PINATA: A bullet note could be added stating "the name is usually spelled with an 'Ñ'(PIÑATA) but Unicode names can only contain ASCII characters" 1FAAA IDENTIFICATION CARD: There should be an informative alias stating "= ID", as well as a bullet note stating "can be used to represent a driver's license or any other form of photo id" 1FAAB LOW BATTERY: There should be a mutual cross reference to "1F50B 🔋 battery" 1FAAC HAMSA: A bullet note could be added stating "can either point up or down". 1FAE6 BITTING LIP: A mutual cross reference to "1F5E2 🗢 lips" could be added 1FAF6 HEART HANDS: There is no need for the rays emanating from the "heart"; leaving them may imply that their inclusion is mandatory, so I recommend removing them from the representative glyph. I would also like to ask, whether or not this character can support different skin tones for each hand, in the future; similar to the HANDSHAKE.Date/Time: Thu Apr 1 19:17:17 CDT 2021
Name: Eduardo Marín Silva
Report Type: Other Question, Problem, or Feedback
Opt Subject: Request to correct errata in my own piece of feedback of the Unicode 14.0 alpha
My last piece of feedback was accidentally called "Final round of revision to the codechart anottations, but the second half correspond to the pictograms" with the second half added by mistake, so it should instead read "Final round of revision to the codechart annotations" with the corrected spelling of 'annotations' If it's possible, I also noticed that my piece of feedback for the ARABIC VERTICAL TAIL reads "considered a letter; only attested in final form", when it should read "considered a letter, not a presentation form, but only attested in final form" Any other mistakes in my pieces of feedback are minor and so do not need correction.Discussion: The Editorial Committee discussed all of this feedback, and suggests the following dispositions:
- For Arabic, remand the feedback to the names list editor.
- For Glagolitic, the missing dotted circle glyphs are inherent to Unibook treatment of notices; remand the suggested alias to the names list editor.
- For Kana Extended-B, an appropriate change has already been made, so no further action is requiried.
- For Ethiopic issues, in the names list, change "Syllables for Sebatbeit" to "Syllables for Gurage" in subheads (1380, 2D80) and "Syllables for modern Gurage" to "Syllables for Gurage" in subhead. (1E7E0)
- For the suggestions related to emoji blocks, issues of aliases should go to Emoji Subcommittee and CLDR-TC for consideration.
- The suggestion for two cross-references between emoji can be remanded to the names list editor.
- Suggestions for glyph changes for emoji should be considered by the code charts editor and the Emoji Subcommittee.
Suggested associated action items:
AI Ken Whistler. Consider the feedback from Eduardo Marín Silva (Apr 1) on PRI #428 for appropriate changes to the names list for Unicode 14.0. (See L2/21-070 Section E4 for details of dispositions.)
AI Jennifer Daniel. Consider the feedback from Eduardo Marín Silva (Apr 1) on PRI #428 on emoji-related aliases and glyph changes, and redirect as appropriate. (See L2/21-070 Section E4 for details.)
Date/Time: Sat Apr 3 11:31:51 CDT 2021
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: Error in Egyptian Hieroglyphs file
The Egyptian Hieroglyphs file (U13000.pdf) contains the misspelling “Invertabrata”. The correct spelling (which was also used by Gardiner) is “Invertebrata”.Discussion: This minor typo has already been corrected in the 14.0 version of NamesList.txt.
Date/Time: Sun Apr 11 02:28:55 CDT 2021
Name: Patrik Sjöwall
Report Type: Public Review Issue
Opt Subject: Unicode 14.0 Alpha review
I found a few issues with some characters for Unicode 14.0 that seem to have gone unnoticed: 0874 ARABIC LETTER ALEF WITH ATTACHED KASRA 0875 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA 0879 ARABIC LETTER ALEF WITH ATTACHED ROUNDDOT BELOW 087C ARABIC LETTER ALEF WITH RIGHT MIDDLE STROKE AND DOT ABOVE 087D ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND DOT ABOVE 0880 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND LEFT RING These letters reqiure more shaping information. It is not clear how the attached fatha or dot will behave in an obligatory LAM-ALEF ligature. 088E ARABIC VERTICAL TAIL This character is missing in ArabicShaping-14.0.0.txt, but it always joins with the preceding letter. It should be included in that file, either as Right_Joining or be given a new joining type (since it does not change its shape, only causes the character to its right to join), and with either a joining group of its own or No_Joining_Group. 08FB ARABIC DOUBLE RIGHT ARROWHEAD ABOVE 08FC ARABIC DOUBLE RIGHT ARROWHEAD ABOVE WITH DOT The comment "also used in Quranic text in African and otherorthographies to represent dammatan" should come after 08FB, not 08FC. The "right arrowhead" is an angular-shaped damma, and the "dammatan" is a double damma (not a double damma with dot). A7C0 LATIN CAPITAL LETTER OLD POLISH O A7C1 LATIN SMALL LETTER OLD POLISH O This letter should be named "O ROGATE", the name "commonly used among specialists" according to the proposal. Then a comment below could say "used for nasal vowel in Old Polish". The current name sounds like this was a letter used instead of "O" in Old Polish, which is not the case. A7D3 LATIN SMALL LETTER DOUBLE THORN A7D5 LATIN SMALL LETTER DOUBLE WYNN These two small letters are added to the standard without matching capitals. That is incosistent with how other comparable letters are encoded. Letters used in a casing orthography are almost always encoded as casing pairs, even if they do not appear in the beginning of a word and the capital leter thus only appears in ALL-CAPS TEXT. As far as I know at least the following capitals were encoded without being needed outside all-caps: 0184 LATIN CAPITAL LETTER TONE SIX 01A6 LATIN LETTER YR 01A7 LATIN CAPITAL LETTER TONE TWO 01BC LATIN CAPITAL LETTER TONE FIVE 0220 LATIN CAPITAL LETTER N WITH LONG RIGHT LEG 037F GREEK CAPITAL LETTER YOT 042A CYRILLIC CAPITAL LETTER HARD SIGN 042C CYRILLIC CAPITAL LETTER SOFT SIGN 1E9E LATIN CAPITAL LETTER SHARP S 2C1F GLAGOLITIC CAPITAL LETTER YERU 2C20 GLAGOLITIC CAPITAL LETTER YERI It is possible that one or two have been used word-initially in languages that were not supported when they were added. On the other hand, it is also quite likely that there are more encoded capitals that never occur in the beginning of a word. Apart from that (and issues already addressed by others) everything looks fine so far. Best regards! /Patrik SjöwallDiscussion: The Editorial Committee considered this feedback. Most is of a technical nature, outside the remit of the Editorial Committee. For the Arabic characters 0874..0875, etc., the observation about shaping should go to the Script Ad Hoc Group for consideration as to whether more documentation should be added regarding behavior in lam-alef ligatures. For 088E ARABIC VERTICAL TAIL, the issue should also go to the SAH for review, to see if it should be added to ArabicShaping.txt.
For the issue of 08FB and 08FC, the Editorial Committee concurs that the annotation is in the wrong location. The names list editor has already made the change in the location of the annotation in the latest draft of NamesList.txt.
For Old Polish o, and the double thorn and double wynn, these issues are outside scope of the Editorial Committee, but we noted that these name changes and requests for capital letters were considered by the SAH already and were not recommended by that group.
No action items need to be recorded, as the SAH is already aware of this feedback.
Date/Time: Sun Apr 11 05:17:33 CDT 2021
Name: Wang Yifan
Report Type: Public Review Issue
Opt Subject: PRI #428: comments on U+1F7F0 and U+1F979
On U+1F7F0: Might be good to have a cross-reference to U+3013 GETA MARK for pure graphic resemblance, and vice versa. On U+1F9F9: The current glyph of FACE HOLDING BACK TEARS does not sufficiently distinguish it from U+1F9FA FACE WITH PLEADING EYES. A quick suggestion that I think effective is to paint tears white (non-hatched) and use a dumbbell-shaped mouth. In the light of the original proposal, this character is intended to include the Samsung emoji depicted in the page 1 of this document. http://www.unicode.org/L2/L2020/20064-face-holding-back-tears.pdf Here, the dumbbell-shaped mouth is a key feature characterizes the emoticon being a stylized depiction of the lip-biting expression in the East Asian graphical convention. It is different from both upward (pouting) and downward (neutral-smiling) curled mouth. This type of expression is also seen in most of the actual examples cited in the page 5 of the proposal, thus should not be left out. Meanwhile, there is U+1F9FA that usually implemented with similarly watery eyes. (See https://emojipedia.org/pleading-face/) Even though not reflected in the current code chart, such designs should be interpreted as the inherent semantics in the original proposal (as FACE WITH GLISTENING EYES; https://www.unicode.org/L2/L2017/17244r-emoji-faces-v11.pdf) instead of mere vendors' discretion, and should be respected as such. The alpha glyph of U+1F9F9 has a rather intricate design of eyes that makes it hard to tell tears apart from eyeballs in black-and-white printing. The tears should be graphically more distinctively separated from its background in order to avoid misinterpretation that it has exactly same kind of eyes the existing glyphs of U+1F9FA have. (Optimally, U+1F9FA should be also updated to have more upward-looking eyes and downward-sloping eyebrows in the code chart.) Last year, U+1F9FA was "the third most used emoji on Twitter" according to Emojipedia, and awarded "Neologism of the Year 2020" in Japan. Special care should be taken to avoid possible confusion by existing users. https://blog.emojipedia.org/a-new-king-pleading-face/ https://ja.wikipedia.org/wiki/%E3%81%B4%E3%81%88%E3%82%93Discussion: The Editorial Committee agreed that the cross-reference suggestion for 1F7F0 to GETA MARK was a good idea. That addition has already been made in the latest draft of NamesList.txt.
The input re 1F9F9 should be reviewed by the Emoji Subcommittee.
Suggested associated action item:
AI Jennifer Daniel. Consider the feedback from Wang Yifan (Apr 11) on PRI #428 regarding U+1F9F9, and redirect as appropriate. (See L2/21-070 Section E4 for details.)
Date/Time: Mon Apr 12 18:08:02 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Currency Symbols
Like the EURO SIGN and other characters, the SOM SIGN U+20C0 should be shown in a Times-like font.Date/Time: Mon Apr 12 18:09:37 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Punctuation
The barred square brackets from 2E56..2E58 should be drawn on the same basis as other square brackets in the code charts.Date/Time: Mon Apr 12 18:12:06 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Glagolitic
The glyphs fr the two new characters must be improved.Date/Time: Mon Apr 12 18:21:57 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Symbols and Pictographs
Something is wrong with the glyphs for 1F979 and 1F97A. The face shown at 1F979 looks just like the glyph for 1F97A in the macOS and iOS Apple Color Emoji UI font. Thanks for keeping my TROLL glyph.Discussion: The Editorial Committee is of the opinion that the glyph for the SOM SIGN is appropriate as is. The code charts editor has already received an updated revision of the font for Glagolitic (from Sebastian Kempgen). The suggestions for glyph fixes for 2E56..2E58 should be remanded to the code charts editor for investigation. We do not seem to be able to replicate the issue Michael has for 1F979 and 1F97A.
Suggested associated action item:
AI Michel Suignard. Consider the feedback from Michael Eversion (Apr 12) on PRI #428 regarding the glyphs for 2E56..2E58, and investigate whether the glyphs can be made more consistent with other square bracket glyphs. (See L2/21-070 Section E4 for details.)
F. Responses to Other Public Feedback
F1. Public Feedback Noted in L2/21-068
FYI: This review refers to items in L2/21-068 listed under "Feedback routed to Editorial Committee for evaluation". Note that many of the reports in L2/21-068 have already been dealt with. In cases where the disposition is already noted in L2/21-068 (in red), the reports are not repeated here for further discussion and disposition.
Date/Time: Tue Feb 23 12:03:14 CST 2021
Name: Jungshik Shin
Report Type: Error Report
Opt Subject: Hangul collation and Hangul tone marks
Note: Changes have been made in the draft text for version 14.0 in response to [the first part of] this report.
Hello, I'm writing to give my feedback on TUC 13 section 18.6 Hangul. On pages 746-747, I found the following regarding the collation of Hangul syllables: "Because the order of the syllables in the Hangul Syllables block reflects the preferred ordering, sequences of Hangul syllables for modern Korean may be collated with a simple binary comparison" Although the above is certainly the case of South Korean collation order since 1988 [1], it does not hold true for North Korean sorting rules. Therefore, the locale data for ko-KP needs to be tailored for the Hangul collation. In addition, the section 18.6 does not mention two Hangul tone marks, U+302E and U+302F. To faithfully represent the old Korean text, Hangul tone marks are required and should be mentioned along with Hangul Conjoining Jamos. It'd be great if the two points above could be reflected in TUS 14 or later. Thank you for your consideration, Jungshik Shin [1] Before 1988, there were a couple of 'competing' collation orders even in South Korea and different dictionaries used different sorting rules. It was only in 1988 that the South Korean orthographic standard explicitly specified how to sort Hangul.Discussion: The Editorial Committee noted that the first section of this feedback has already been addressed in the latest draft for the 14.0 core specification. For the issue regarding the non-mention of two Hangul tone marks in Section 18.6, the Editorial Committee suggests that the editor follow up with Jungshik to get specific suggestions for text additions to the core specification.
Suggested associated action item:
AI Julie Allen. Work with Jungshik Shin to prepare new text for the core specification Section 18.6, to explain the use of the two Hangul tone marks. For Unicode 14.0.
Date/Time: Tue Feb 23 19:56:30 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: U+034F COMBINING GRAPHEME JOINER is not always ignored for display
Section 5.21 says “U+034F COMBINING GRAPHEME JOINER is likewise always ignored for display.” This is not true: it has no visible glyph of its own, but it may have a visible effect on other glyphs. For example, see Figure 7-11 and UTR #53. As section 5.21 says earlier on the same page, “In such cases, even though the format character or variation selector has no visible glyph of its own, it would be inappropriate to say that it is ignored for display, because the intent of its use is to change the display in some visible way.”Discussion: The Editorial Committee discussed this feedback, and agrees that the text could be improved, but we are not advising a rewrite for the 14.0 core specification at this time.
Date/Time: Fri Feb 26 03:19:19 CST 2021
Name: huang xin
Report Type: Error Report
Opt Subject: What is the exact definition of assigned character?
The term assigned character seems to have conflict means in the Unicode Standard Version 13.0. Quoted from chapter 2.1: "In contrast, a character encoding standard provides a single set of fundamental units of encoding, to which it uniquely assigns numerical code points. These units, called assigned characters, are the smallest interpretable units of stored text." This suggests that the "units" are called "assigned characters", and "numerical code points" are assigned to "assigned characters". Quoted from chapter 3.5 D49: "Private-use code points are considered to be assigned characters" This suggests that assigned character is a kind of code point. So there is conflict between the two quotes, if assigned character is some kind of code point, how can "numerical code point" be assigned to some kind of code point?Discussion: The Editorial Committee discussed this feedback, and agrees that the text could be improved, but we are not advising a rewrite for the 14.0 core specification at this time. For this and the prior item, it would help the editors substantially to have concrete suggestions for how to improve the text. Otherwise, we can take it as "problem noted", but no one has stepped forward to actually work on specific text improvements that would pass muster.
Date/Time: Sat Feb 27 21:03:22 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Chapter 17 intro miscounts Indonesian scripts
The introduction to chapter 17 in TUS 13.0 says "Indonesia has many local, traditional scripts, most of which are ultimately derived from Brahmi. Six of these scripts are documented in this chapter." The actual number of Indonesian scripts documented in the chapter is seven; Makasar is one of them. Maybe get rid of the number, as several more scripts are to come? It’s also not quite clear why Makasar gets its own paragraph; the paragraph suggests that it belongs between Rejang and Buginese.Discussion: This comment has already been addressed by the editors, with appropriate changes made in the draft for the 14.0 core specification.
Date/Time: Fri Mar 12 19:45:54 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Bidi format characters do affect characters’ glyphs
Chapter 5 says “Bidirectional format characters do not affect the glyph forms of displayed characters”, but that is not true. The main point of that sentence (that bidi format characters have no glyphs) is still true, but it needs a better explanation. For example, U+0028 LEFT PARENTHESIS has different glyphs depending on the bidi level. In general, overriding a character’s directionality may have an arbitrary effect on its glyph form.Date/Time: Fri Mar 12 19:56:45 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Unexpected variation sequences do affect display
Chapter 5 says “In other contexts, a format character may have no visible effect on display at all. [...] Another example is a variation selector following a base character for which no standardized or registered variation sequence exists. In that case, the variation selector has no effect on the display of the text.” However, that is an oversimplification. The presence of an unexpected variation selector may block another variation sequence, may block canonical reordering, and may block AMTRA reordering, all of which have effects on the display of the text.Discussion: The Editorial Committee noted that the text in Chapter 5 could be improved, but we are not advising a rewrite for the 14.0 core specification at this time.
Date/Time: Fri Mar 12 20:06:54 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Does <ZWJ, ZWJ> equal ZWJ?
UTS #51 defines various sequences with ZWJ, such as <1F415, 200D, 1F9BA>. How should they be rendered when there are multiple ZWJs, as in <1F415, 200D, 200D, 1F9BA>? According to chapter 5 of the core specification, “a sequence of two adjacent joiners, <..., ZWJ, ZWJ, ...>, is a case where the extra ZWJ should have no effect.” On the other hand, I get the impression that extraneous ZWJs go against the spirit of UTS #51. Is that sentence in the core specification meant to be taken literally? What effects should other default ignorable code points have within emoji?Discussion: The Editorial Committee noted that responding to this suggestion would require technical input both from the owners of UTS #51 and more generally the Emoji Subcommittee. No editorial changes are recommended at this time without such input.
Date/Time: Fri Mar 12 20:37:10 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: When does ZWJ act like <ZWJ, ZWNJ, ZWJ>?
Chapter 23 says that “between Arabic characters a ZWJ acts just like the sequence <ZWJ, ZWNJ, ZWJ>, preventing a ligature from forming instead of requesting the use of a ligature that would not normally be used.” What is an Arabic character, and which characters are relevant for the purpose of “between”? Consider the sequence <meem, ZWJ, U+17B4 KHMER VOWEL INHERENT AQ, jeem>. The ZWJ is between an Arabic character and a Khmer character. Is it right to conclude that the ZWJ therefore does not act just like <ZWJ, ZWNJ, ZWJ>, leaving it free to ligate the meem and jeem?Discussion: The Editorial Committee noted that his comment reflects a technical concern, and would require input from the Properties & Algorithms group, before any appropriate improvement to the text could be suggested.
Date/Time: Mon Mar 29 23:44:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Confusion between nonspacing marks and nonspacing marks
The Unicode Standard has a general category Mn “nonspacing mark”. The Unicode Standard also has a definition D53: “Nonspacing mark: A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me).” This definition seems misguided for two reasons: ① Enclosing marks are almost always spacing, contradicting the statement that supports D53: “It generally does not consume space along the visual baseline in and of itself.” Adding an enclosure to a glyph requires space – otherwise it results in a smudge. Of the 25 font families I found on my Mac that contain U+20DD combining enclosing circle, only one monospaced font uses an enclosing circle glyph with the same width as any other glyph, predictably resulting in smudges. All 24 others use a glyph that’s large enough to accommodate the glyphs of most base characters with some padding, which means it’s substantially wider than most base glyphs. This is very different from the exceptional and context-dependent widening described for the real nonspacing mark U+0302 combining circumflex accent in “î”. ② Using the same term for two related but different concepts results in confusion. This is most obvious in an example for a regular expression character class in TUS appendix A Notational Conventions, page 941, which describes [\p{gc=Nonspacing_Mark}] as “nonspacing marks” – clearly correct based on the general category and clearly wrong based on definition D53. TUS section 5.12 Strategies for Handling Nonspacing Marks, page 217, claims “Properly speaking, a nonspacing mark is any combining character that does not add space along the writing direction.” and again “Composite character sequences can be rendered effectively by means of a fairly simple mechanism. In simple character rendering, a nonspacing combining mark has a zero advance width, and a composite character sequence will have the same width as the base character.” Both statements are incorrect for enclosing marks in most fonts. This leads to an inappropriate truncation strategy on page 219: “In simple systems, it is easiest to truncate by width, starting from the end and working backward by subtracting character widths as one goes. Because a trailing nonspacing mark does not contribute to the measurement of the string, the result will not separate nonspacing marks from their base characters.” Page 222 discusses letterspacing: “This process needs to be modified if zero-width nonspacing marks are present in the text. Otherwise, if extra justifying space is added after the base character, it can have the effect of visually separating the nonspacing mark from its base.” This issue would affect non-zero-width nonspacing marks as well, which D53 creates. And so on... I suggest changing D53 to define “nonspacing mark” based only on general category Mn, and discussing enclosing marks either together with nonspacing marks or separately, as appropriate in each context.Discussion: The Editorial Committee feels that this suggestion has merit, but we are not advising a rewrite for the 14.0 core specification at this time. The terminological treatment of enclosing marks (gc=Me) as nonspacing marks is of long standing in the standard (going back nearly 30 years), and a change in the core definitions of Chapter 3 for this would require a very specific and detailed proposal arguing the case and working through the implications for the text in Chapter 3, other parts of the core specification, and ultimately other specifications and pages on the website.
Date/Time: Tue Mar 30 00:11:37 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incomplete discussion of combining marks
The Unicode Standard has two sections with guidelines on nonspacing marks: 5.12 Strategies for Handling Nonspacing Marks and 5.13 Rendering Nonspacing Marks. The second paragraph of the first of these sections says: “In this section and the following section, the terms nonspacing mark and combining character are used interchangeably.” This sentence is confusing because the terms are not interchangeable at all: Combining characters, according to definition D52, include nonspacing (general category Mn), spacing (Mc), and enclosing (Me) marks. Even when applying the dubious definition D53, nonspacing marks do not include spacing marks. Most of the issues described in the two sections affect spacing and enclosing marks as well, so the sections are incomplete if they don’t cover them. The solutions, however, often need to be modified for them.Discussion: The Editorial Committee considers these suggestions to be reasonable, but we would need specific text changes for review. Note that any changes to this text might also depend on the treatment of basic definitions in Chapter 3.
Suggested associated action item:
AI Norbert Lindenberg. Provide a proposal for specific text changes to improve the discussion of nonspacing marks in sections 5.12 and 5.13 of the core specification.
Date/Time: Tue Mar 30 00:15:01 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statement about grapheme clusters
The last paragraph of TUS section 2.11 Combining Characters contains this statement: “This core concept is known as a *grapheme cluster*, and it consists of any combining character sequence that contains only *nonspacing* combining marks or any sequence of characters that constitutes a Hangul syllable (possibly followed by one or more nonspacing marks).” This statement is incorrect. Both kinds of grapheme clusters defined in UAX 29, legacy grapheme clusters and extended grapheme clusters, can contain *spacing* combining marks.Discussion: The Editorial Committee agrees that the text should be improved to address this concern.
Suggested associated action item:
AI Ken Whistler. Provide a proposal for specific text changes to rework the discussion of grapheme cluster in Section 2.11 of the core specification, referring out to UAX #29 for definition by algorithm.
Date/Time: Tue Mar 30 00:19:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statements about combining characters
The first paragraph of TUS section 2.11 Combining Characters has two incorrect statements: ① “Characters intended to be positioned relative to an associated base character are depicted in the character code charts above, below, or through a dotted circle.”: In reality, combining characters can be depicted on any side of a dotted circle, on multiple sides, crossing it, or enclosing it. ② “The Unicode Standard distinguishes two types of combining characters: spacing and nonspacing.” The standard, at least in its definition of general categories, distinguishes three types of combining characters: spacing, nonspacing, and enclosing, although definition D53 then adds ambiguity.Discussion: The Editorial Committee agrees that the text should be improved to address these incorrect statements.
Suggested associated action item:
AI Ken Whistler, Editorial Committee. Provide corrected text for these two statements in Section 2.11 of the core specification. For Unicode 14.0.
Date/Time: Fri Apr 2 19:05:22 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Unclear reference to “dashes” in TUS section 12.9 Malayalam
TUS section 12.9 Malayalam, page 512 says “... rendering engines should be prepared to handle Malayalam letters (including vowel letters), digits (both European and Malayalam), dashes, U+00A0 NO-BREAK SPACE and U+25CC DOTTED CIRCLE as base characters for the Malayalam vowel signs, U+0D4D MALAYALAM SIGN VIRAMA, U+0D02 MALAYALAM SIGN ANUSVARA, and U+0D03 MALAYALAM SIGN VISARGA. They should also be prepared to handle multiple combining marks on those bases.” It’s not clear which “dashes” this refers to. The Unicode Standard, in table 6-3 and in PropList.txt, defines two overlapping sets of dashes that together contain 30 dash characters. It is very unlikely that all of them are relevant to Malayalam, and OpenType in particular is not good at handling mixed-script clusters, such as a combination of U+1806 MONGOLIAN TODO SOFT HYPHEN with U+0D02 MALAYALAM SIGN ANUSVARA.Discussion: The Editorial Committee agrees that the text is unclear, and suggests that it would be simplest to clarify the text by specifying the list as some dashes that have the property value InSc=Consonant_Placeholder.
Suggested associated action item:
AI Ken Whistler, Editorial Committee. Provide corrected text for for Section 12.9 Malayalam of the core specification, to clarify which dashes are referred to. For Unicode 14.0.
Date/Time: Fri Apr 2 18:21:40 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Dash definitions out of sync
The lists of dash characters in TUS table 6-3 and in PropList.txt are out of sync. Table 6-3 includes 007E TILDE, which is not listed as a Dash in PropList.txt. In turn, PropList.txt lists 2E1A HYPHEN WITH DIAERESIS, 2E3A..2E3B TWO-EM DASH..THREE-EM DASH, 2E40 DOUBLE HYPHEN, 10EAD YEZIDI HYPHENATION MARK, which are absent from TUS table 6-3. It’s not clear to me what qualifies 10EAD YEZIDI HYPHENATION MARK as a dash.Discussion: The Editorial Committee agrees that the table and the data file are out of synch. We suggest that Table 6-3 be updated for the 14.0 core specification. The status of U+10EAD as a dash (or not) is not editorial, and would have to be taken up with the Properties & Algorithms group and/or the Script Ad Hoc Group.
Suggested associated action item:
AI Ken Whistler, Editorial Committee. Update Table 6-3 in the core specification, to make it consistent with the data file that defines dashes, PropList.txt. For Unicode 14.0.
G. Miscellaneous Topics
G1. (None noted)