The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 20, 2021, since the previous cumulative document was issued prior to UTC #168 (July 27, 2021).
The links below go directly to open PRIs and to feedback documents for them, as of July 20, 2021.
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to Properties & Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
(None at this time.)
Date/Time: Sun Jun 13 19:07:29 CDT 2021
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: On the proposed Book Pahlavi encoding model
This is a response to this document: https://www.unicode.org/L2/L2021/21090-book-pahlavi-model.pdf called L2/21-090. I would like to mention that I find the proposed model mostly appropriate and I would like to commend the work of all the contributors. I only have three suggestions: A) Encode the double curled tooth (samekh) as a separate character, to be consistent with the other "regular" teeth. In page 13, the author mentions: "The [double tooth] is encoded as a separate character in order to enable typographical support for different representations of aleph-heth in initial, medial, and final position." followed by... "The ligature of [triple tooth] aleph-heth+gimel-daleth-yodh is encoded as an atomic character in order to enable typographical support for different representations of it, as compared to [double tooth]" This in my opinion merits treating them separately. But it also means that the double and triple regular teeth have separate characters, but the double tooth doesn't. In my opinion it's better to treat all teeth alike, because not only does that mean that the "regular samekh letter" can be treated as a unit, but it also expands the same benefits of treating the double and triple tooth atomically to the double curled tooth. If necessary, decomposition sequence can be added to the double and triple variants of the teeth, making them pre-composed characters. The name of the character can be "double curled tooth-samekh" B) Treat most of the contextual forms as a sequence of two characters. In page 12,10 contextual forms of 6 different letters are proposed as atomic characters. I believe that the "short waw-nun-ayin-resh" and the "final pe-sadhe" have enough technical justification for encoding, so this section does not concern them. The rest of them are "bellied" variants of other letters like zayin and lamedh, each with "half" and "full" bellies. In my opinion this is unnecessarily redundant, given that the separate bellies are going to be encoded separately anyway. These could be easily rendered by a sequence of the base letter and the desired belly (e.g. "zayin" + "half belly character" or "zayin" + "full belly character"). C) Change the name of the belly characters by adding the "full" prefix as appropriate. Like it was stated before, the belly primitives are encoded atomically, and due to the rendering requirements, it necessitates a "half belly" variant apart from the "full belly". None of that is an issue, and I must say is quite an ingenious solution; I just would change the name of the "bellies" to "full bellies", that reduces the change of confusion, since if someone reads the word "belly" in isolation the reader can't know if it refers to all bellies in general or just the ones that aren't halved. It also has the effect of associating the concepts of a "belly full of food" and "half filled belly of food", strengthening their relations and identities. The names would therefore read: Full Belly Half Belly Full Straight Belly Half Straight Belly Full Curled Belly Half Curled Belly In summary, my changes would add one more character, remove 8 other characters and rename 3 other characters. I hope for the Book Pahlavi script to be accepted soon, for Unicode 15. My dearest wishes: Eduardo.
Date/Time: Tue Jun 15 13:45:10 CDT 2021
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comment on L2/21-107
L2/21-107 proposes “that spacing superscript й, ў, ҫ, ҙ etc. [...] be typeset with diacritics”. Because U+04AB CYRILLIC SMALL LETTER ES WITH DESCENDER and U+0499 CYRILLIC SMALL LETTER ZE WITH DESCENDER are encoded without decompositions, if modifier letter versions of them are attested, shouldn’t the modifier letter versions be encoded without decompositions too?
(None at this time.)
Date/Time: Fri Jun 18 09:57:23 CDT 2021
Name: Charlotte Buff
Report Type: Other Question, Problem, or Feedback
Opt Subject: Implications of new emoji proposal guidelines on Extended_Pictographic property
The new guidelines for submitting emoji proposals, published 15th April, contain the following caveat: »Submissions proposing to emojify existing Unicode characters will not be accepted.« Does this mean that no already existing character that isn’t an emoji now is ever going to receive emoji status in the future, or merely that the UTC will not consider requests specifically asking for the emojification of existing characters, but that such emojifications may still take place through other processes? If the former, this new policy has interesting implications for the Extended_Pictographic property. Extended_Pictographic was originally created to future‐proof the line breaking and text segmentation behaviour of ZWJ sequences. By preemptively assigning Extended_Pictographic=True to non‐emoji characters with emoji‐like qualities – the implication being that said characters could one day become emoji themselves – even systems that haven’t kept up with the latest emoji release would still be able to handle new ZWJ sequences correctly. However, if characters are now locked into their emojiness the moment they are encoded, this aspect of the property has become obsolete. There currently exist over 600 characters with Extended_Pictographic=True but Emoji=False. Under a strict interpretation of the new guidelines, they should be excluded from the Extended_Pictographic set going forward since they can never become emoji anyway, and according to definitions ED‑15a and ED‑16 in UTS #51, only emoji can formally be part of ZWJ sequences – ZWJ sequences being the sole application of the Extended_Pictographic property. The practice of marking unassigned ranges of codepoints reserved for future emoji use as Extended_Pictographic would continue as usual. While most characters in the intersection of Extended_Pictographic=True and Emoji=False would make for poor emoji candidates, there are a few symbols in there that could proof to be popular with users, so it is not unlikely that the UTC will receive proposals for emoji that pretty much already exist in Unicode. However, the new wording seems to suggest that in such cases, an entirely new character would be encoded regardless. While I personally think that emoji presentation is an immensely unfortunate property for a codepoint to have, I also believe that it goes against the spirit and purpose of the Unicode Standard to encode two separate versions of the exact same abstract character just because they are expected to be displayed with different fonts.
Date/Time: Mon Jun 14 16:23:25 CDT 2021
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: On the response of the editorial comitee on my suggested modifications
This is a response to document L2/21-106: https://www.unicode.org/L2/L2021/21106-u14-annotation-resp.pdf I would like to begin by expressing my gratitude and delight at the answer of the editorial committee. I hope this can serve as an opportunity for greater engagement between me and the body in the future. I see the proposed inclusions as a great compromise between my proposal and the status quo. Some mistakes are either a wrong glyph being used or a copy-paste error. Ignoring those, I would only like to make a few further suggestions. EXCLAMATION MARK: Add a reference to the upcoming 'MEDIEVAL EXCLAMATION MARK' NUMBER SIGN: Add a note next to one of the informative alias "= octothorpe (originating from Bell Labs the spelling of this alias is inconsistent)" source: https://en.wikipedia.org/wiki/Number_sign#Names_of_the_character DOLLAR SIGN: Mention the fact that it is often used as a generic currency sign. AMPERSAND: Tweak the wording of the bullet note "• originally a ligature of the letter 'e' and 't' from the Latin 'et'" Also retain the reference to '2227 LOGICAL AND' COMMA: Add a reference to its ancestor '2E12 HYPODIASTOLE' FULL STOP: Retain the reference to '00B7 MIDDLE DOT' QUESTION MARK: Remove the references to 2048 and 2049, given that they are redundant on the presence of the reference to '2047 DOUBLE QUESTION MARK' Also, add a reference to the upcoming 'MEDIEVAL QUESTION MARK' COMMERCIAL AT: Add a bullet note saying "• originally used for an archaic unit of weight in Spain, called 'arroba'" LATIN CAPITAL LETTER C: Remove the reference to the 'CYRILLIC CAPITAL LETTER ES', that letter is considered to be part of the basic alphabet. Including it would necessitate adding references to all other Greek and Cyrillic homoglyphs (like the lunate sigma symbol at 03F9), however excluding the letters of the basic alphabet seems to be a good compromise. A small exception can be made for the capital letter iota, listed under the capital I, since it completes the set of references nicely. Also I suggest keeping the reference to '2104 CENTRE LINE SYMBOL', but not repeat it under the capital L. LATIN CAPITAL LETTER P: Retain the reference to '214A PROPERTY LINE SYMBOL', but don't repeat it under capital L LATIN SMALL LETTER E: Retain the reference to 'AB32 LATIN SMALL LETTER BLACKLETTER E' LATIN SMALL LETTER F: Retain the references to the letters '0192 LATIN SMALL LETTER F WITH HOOK' and 'AB35 LATIN SMALL LETTER LENIS F' and '03DC GREEK LETTER DIGAMMA' LATIN SMALL LETTER L: Retain the reference to '01C0 LATIN LETTER DENTAL CLICK' but don't repeat it under capital I LATIN SMALL LETTER O: Retain the reference to 'AB3D LATIN SMALL LETTER BLACKLETTER O' LATIN SMALL LETTER R: Retain the references to 'AB47 LATIN SMALL LETTER R WITHOUT HANDLE' and 'AB4B LATIN SMALL LETTER SCRIPT R' (do note that the suggestions under the small letters e, f, l, o and r, are made due to their confusability). TILDE: Retain the reference to '301C WAVE DASH'. BROKEN BAR: Mention the fact, that the BROKEN BAR was originally an allograph of '007C VERTICAL LINE' Source: https://en.wikipedia.org/wiki/Vertical_bar#Solid_vertical_bar_vs_broken_bar MULTIPLICATION SIGN: reword the informative alias to say "= Cartesian product (z notation) LATIN CAPITAL LETTER O WITH STROKE: Add a reference to the upcoming 'LATIN CAPITAL LETTER OLD POLISH O' since this character was the typical replacement up until now LATIN SMALL LETTER SHARP S: Add a reference to the upcoming 'LATIN SMALL LETTER MIDDLE SCOTS S' LATIN SMALL LETTER AE: Add a reference to '1D6B LATIN SMALL LETTER UE' LATIN SMALL LETTER O WITH STROKE: Add a reference to the upcoming 'LATIN SMALL LETTER OLD POLISH O' since this character was the typical replacement up until now LATIN SMALL LETTER THORN: Add a reference to the upcoming 'LATIN SMALL LETTER DOUBLE THORN' That would be all.
Date/Time: Wed Jun 16 00:20:22 CDT 2021
Name: Neal Raulerson
Report Type: Error Report
Opt Subject: Correction in Standard p.126 D93b a.
Instead of: "a. the initial subsequence of a well-formed code unit sequence..." I think it is supposed to be: "a. the initial subsequence of an ill-formed code unit sequence..." It makes more sense that way. Please let me know, thanks!
Date/Time: Sun Jun 27 12:38:08 CDT 2021
Name: Alexei Chimendez
Report Type: Error Report
Opt Subject: Use of CANCEL TAG in emoji flags
UTS #51 allows for the interchange of various flags through "emoji tag sequences", specified as: an emoji character or sequence, followed by one or more component characters from the block Tags, and terminated with the character CANCEL TAG. In the Unicode Standard, sec. 23.9 reads: > There are two uses of cancel tag. To cancel a tag value of a particular type, prefix the cancel tag character with the tag identification character of the appropriate type. [...] To cancel any tag values of any type that may be in effect, use cancel tag without a prefixed tag identification character. Continuing, it specifies: > Inserting a bare cancel tag in places where only the language tag needs to be canceled could lead to unanticipated side effects if this text were to be inserted in the future into a text that supports more than one tag type. However, the use of CANCEL TAG in flags is, in effect, a "bare cancel tag", because it is not preceded by a tag identification character (it is only preceded by tag component characters). The presence of an emoji flag in a text may thus inadvertently cause the canceling of all applicable tags. While the Standard currently only specifies one kind of tag (the language tag, which is "strongly discouraged"), the use of CANCEL TAG in emoji flags may cause issues if other kinds of tags are introduced in the future, or for applications or protocols that make use of "private use" tags to signal in-band information. The simplest solution is to change the wording in sec. 23.9 to read: > To cancel any tag values of any type that may be in effect, use cancel tag without a prefixed tag identification character or other tag character. With this change, the CANCEL TAG character in the sequence > U+1F3F4 U+E0066 U+E006F U+E006F U+E007F has no effect and is ignored, while in the sequence > U+1F3F4 U+66 U+6F U+6F U+E007F the CANCEL TAG character will cancel all tags. This change prevents the inadvertent canceling behavior of emoji tag sequences as described above.
Date/Time: Fri Jul 2 18:12:11 CDT 2021
Name: Mark Roberts
Report Type: Problems / Feedback about website
Opt Subject: Em and En Dash and Space
You you please consider adding a Q&A on this page: https://www.unicode.org/faq/punctuation_symbols.html Question: Do the widths of the en dash and en space need to half the widths of the em dash and em space? Answer: (I believe the answer is yes--historically it has been.) Although this PDF https://www.unicode.org/charts/PDF/U2000.pdf implies that the en space is half an em space, it makes no mention of the relationship of an en dash to an em dash. Furthermore, if an en dash is supposed to be half an em dash, the glyphs in that same PDF show that the en dash to be drawn slightly greater than half an em dash. I really hope you will address this issue. It comes up frequently with font designers. Thank you.
Date/Time: Tue Jul 6 23:53:00 CDT 2021
Name: J Andrew Lipscomb
Report Type: Public Review Issue
Opt Subject: 14.0.0β issues
(Note: This report is actually about document L2/21-106, not the 14.0 beta.)
These are all in the text accompanying the code charts for Basic Latin and the Latin-1 Supplement. 1. (.) Canadian syllabics full stop is 166E, not 16EE. 2. (:) Tricolon is 205D, not 295D. 3. (C) Degree Celsius is 2103, not 2013. 4. Sections on \, °, x, X, q, and ß have stray text.
Date/Time: Thu Jul 8 20:20:24 CDT 2021
Name: Paul Holder
Report Type: Other Question, Problem, or Feedback
Opt Subject: Date encoding
Since there is forever a fight over the "correct" way to encode/display a date, it seems like Unicode should standardize it. This way a user application can encode a date in a specified way, and user agents can display it in whatever way an end user feels motivated.