This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Fri Feb 12 09:39:14 CST 2021
Name: Jan Nijtmans
Report Type: Public Review Issue
Opt Subject: typo in UnicodeData-14.0.0d4.txt
Note: This has now been fixed in the Alpha data file.
In UnicodeData-14.0.0d4.txt, there's the following line (line 17988): 105B3;VITHKUQI SMALL LETTER SE;Ll;0;L;;;;;N;;;1058C;;1059C But code point "1059C" is "VITHKUQI SMALL LETTER DE". I suspect this is a typo, the "9" should have been an "8". Since "VITHKUQI CAPITAL LETTER SE" make more sense as being the titlecase variant of this character. Thanks, Jan Nijtmans
Date/Time: Sat Feb 13 15:58:46 CST 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Recommendations on the alpha code charts
Note: This feedback has been taken into account in updated annotations for the NamesList.txt file.
Combining Diacritical Marks supplement 1. The "combing dot above left" should have a reference to the "Syriac feminine dot" Latin Extended-D 1. The header above the old polish letters could read "Additional medieval letters" rather than just "Additional letters" 2. The "closed insular g" letters should have reciprocal cross reference to the regular "insular g" letter as well as the "middle Scots s" letters should have reciprocal cross references to the "sharp s" and "capital sharp s" and similarly for the double thorn and double wynn with their regular counterparts. 3. The header above the 2 modifier letters for Chatino, should have "(México)" appended like the Mazahua letters. 4. The header above the "modifier letter capital q" should read "Modifier letter for phonemic transcription of Japanese", so the bullet note below can be removed and replaced with a mutual cross reference to the "small capital q" Latin Extended-F 1. Is there any reason why the "Modifier letter small capital aa" does not have a <super> decomposition with the regular letter? Brahmi 1. The entire new section should be in a single header saying "Old Tamil extensions" and the note under that should be removed anyway. 2. The position of the Old Tamil LLA, does not follow the usual order that the Indic code-charts follow, because the consonant should come before the vowel signs, but that isn't a big priority. 3. The new Tamil Virama should have a bullet note stating that it is a "pure killer" and maybe a similar note for the original Virama saying that it produces conjuncts. 4. The "Anusvara sign" should be annotated to indicate that it shouldn't be used as a replacement for the Tamil Virama (this is what was done in the code-chart for Tamil. Musical Symbols 1. The header directly above the new accidentals should be dropped, and the note under the first header should be changed to read "These two characters are used in Iranian music notation to represent quarter notes."
Date/Time: Sun Feb 14 09:01:03 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Incorrect CCC of U+10F83
Note: This has now been corrected in the Alpha data file.
Proposed Character U+10F83 OLD UYGHUR COMBINING DOT BELOW currently has canonical combining class 230 (Above), but the correct value would be 220 (Below).
Date/Time: Mon Feb 15 19:56:28 CST 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Suggestions on the alpha code chart of Diacritical Marks Extended
1. Whenever a header says "Used in..." It should read instead "Marks for..." 2. The header above 1AC1 should say (after the current header) "... Do not use pairs of these marks as replacement for 1ABB or 1ABD" 3. The two marks "combining double plus above and below" should be moved up, to be next to the single "plus sign above" and the Ormulum marks shifted down two spots. 4. The bullet note above the "number sign above" currently reads "used extensively in J.P. Harrington’s transcriptional notation" I suggest for it to read "Used by J.P. Harrington to indicate heavy or contrastive stress" 5. The "combining triple acute accent" should have a mutual cross reference to the "combining double acute accent"
Date/Time: Sun Feb 14 08:59:40 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Incorrect decomposition mapping of U+107A9
Note: This has now been corrected in the Alpha data file.
Proposed character U+107A9 MODIFIER LETTER SMALL R WITH FISHHOOK currently decomposes to U+207E SUPERSCRIPT RIGHT PARENTHESIS, but the correct mapping would be to U+027E LATIN SMALL LETTER R WITH FISHHOOK.
Date/Time: Sun Feb 14 09:29:15 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: U+1CF42 and U+1CF43 have nonconformant names
The names of proposed characters U+1CF42 (ZNAMENNY PRIZNAK MODIFIER LEVEL 2) and U+1CF43 (ZNAMENNY PRIZNAK MODIFIER LEVEL 3) currently do not conform to section 4.8 of the Unicode Standard. A hyphen-minus needs to be inserted before the final digit in both names because a digit must not immediately follow a space.
Date/Time: Sun Feb 14 09:42:18 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: General category of U+1DF0A
Proposed character U+1DF0A LATIN LETTER RETROFLEX CLICK WITH RETROFLEX HOOK currently has general category Ll (Lowercase_Letter). A more appropriate value would be Lo (Other_Letter) which is shared by most other click letters, including its hook‐less counterpart U+01C3 LATIN LETTER RETROFLEX CLICK.
Date/Time: Sun Feb 14 09:58:07 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Names of U+1FAF1 and U+1FAF2
The names of proposed characters U+1FAF1 RIGHTWARD BACKHAND and U+1FAF2 LEFTWARD HAND could potentially be changed to RIGHTWARDS BACKHAND and LEFTWARDS HAND respectively. The words “rightward” and “leftward” do not occur in any other Unicode character names; instead the spellings “rightwards” and “leftwards” are used every single time.
Date/Time: Sun Feb 14 10:01:09 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Defective glyph for U+1FAE2
The code chart glyph for proposed character U+1FAE2 FACE WITH OPEN EYES AND HAND OVER MOUTH is inverted, showing a solidly filled face instead of an outline drawing like the other faces.
Date/Time: Sun Feb 14 10:25:15 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Names of dezh and tesh digraphs with hooks
The names of the following proposed characters should be adjusted to include the word “digraph” for consistency with their respective hook‐less counterparts (U+02A4 LATIN SMALL LETTER DEZH DIGRAPH and U+02A7 LATIN SMALL LETTER TESH DIGRAPH): U+1DF12: LATIN SMALL LETTER DEZH WITH PALATAL HOOK → LATIN SMALL LETTER DEZH DIGRAPH WITH PALATAL HOOK U+1DF17: LATIN SMALL LETTER TESH WITH PALATAL HOOK → LATIN SMALL LETTER TESH DIGRAPH WITH PALATAL HOOK U+1DF19: LATIN SMALL LETTER DEZH WITH RETROFLEX HOOK → LATIN SMALL LETTER DEZH DIGRAPH WITH RETROFLEX HOOK U+1DF1C: LATIN SMALL LETTER TESH WITH RETROFLEX HOOK → LATIN SMALL LETTER TESH DIGRAPH WITH RETROFLEX HOOK
Date/Time: Sun Feb 14 10:57:51 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: General category of Znamenny priznak modifiers
The Znamenny priznak modifiers (U+1CF42..U+1CF46) were given the general category Cf (Format). A more appropriate value would be Mn (Nonspacing_Mark) because they apply directly to the preceding character, comparable to variation selectors for instance. Other properties like bidi class and grapheme cluster break would need to be adjusted accordingly as well.
Date/Time: Mon Feb 15 15:51:33 CST 2021
Name: Neil S Patel
Report Type: Public Review Issue
Opt Subject: Script Extensions for Arabic Punct used for N'ko and Adlam
Hello, Recently, I have been working with a couple of W3C groups to look into script itemization issues. We have noticed that with both Adlam and N'ko when Arabic punctuation, typically used with both scripts, appears in a string of text it triggers unexpected fall backs. This occurs even when the tested font includes the appropriate Arabic punctuation. After some discussion it was suggested that the script extensions could be responsible. Reference: https://github.com/w3c/afrlreq/issues/18 Currently the script extensions for Arabic punctuation is listed as follows. There are no references to African scripts. # ================================================ # Script_Extensions=Arab Rohg Syrc Thaa Yezi 060C ; Arab Rohg Syrc Thaa Yezi # Po ARABIC COMMA 061B ; Arab Rohg Syrc Thaa Yezi # Po ARABIC SEMICOLON 061F ; Arab Rohg Syrc Thaa Yezi # Po ARABIC QUESTION MARK # Total code points: 3 # ================================================ I would like to propose the following update to include Adlam and N'ko. # ================================================ # Script_Extensions=Arab Nko Rohg Syrc Thaa Yezi 060C ; Arab Nko Rohg Syrc Thaa Yezi # Po ARABIC COMMA 061B ; Arab Nko Rohg Syrc Thaa Yezi # Po ARABIC SEMICOLON # Total code points: 2 # ================================================ # ================================================ # Script_Extensions=Adlm Arab Nko Rohg Syrc Thaa Yezi 061F ; Adlm Arab Nko Rohg Syrc Thaa Yezi # Po ARABIC QUESTION MARK # Total code points: 1 # ================================================ Thanks.
Date/Time: Tue Feb 23 21:42:01 CST 2021
Name: kirk miller
Report Type: Public Review Issue
Opt Subject: Character in Latin G is under wrong heading
Note: This feedback has been taken into account in updates for the NamesList.txt file.
In Latin Extended-G, the character: 1DF07 𝼇 LATIN SMALL LETTER REVERSED ENG is listed under the heading "IPA extensions". It should appear under the preceding heading, "IPA letters for disordered speech", as Michael Everson had it in his mapping. This can be accomplished by moving the heading "IPA extensions" down by one character. The error is easily verified with the chart for the extIPA alphabet for disordered speech, published by the ICPLA. The IPA copy of the chart is available here: https://www.internationalphoneticassociation.org/sites/default/files/extIPA_2016.pdf In that chart, the three letters REVERSED ENG, REVERSED K and REVERSED SCRIPT G appear together as "velodorsal oral and nasal stops" in the bottom-right table.
Date/Time: Fri Feb 26 15:42:43 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Sharada Code Chart
In the character list on Page 3, SHARADA VOWEL SIGN VOCALIC LL and SHARADA VOWEL SIGN E are overlapping. This needs to be fixed.
Date/Time: Fri Feb 26 15:56:05 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Telugu Nukta Glyph in the Code Chart
As per L2/20-085, Telugu Nukta should have the combining circle below as its representative glyph to avoid confusion with the aspirate marker. (If the current shape will be retained) The annotation "can also appear as a large dot" is moot. The glyph is already a dot. V
Date/Time: Sat Feb 27 19:09:30 CST 2021
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: Annotations for Balinese surang and Sundanese panglayar
Note: This feedback has been taken into account in updates for the NamesList.txt file.
The annotations proposed in L2/20-150 were incorrectly transcribed into NamesList-14.0.0d7.txt: – U+1B03 BALINESE SIGN SURANG should have the annotation “• also used for repha in transliteration of Kawi”. – U+1B81 SUNDANESE SIGN PANGLAYAR should NOT have that annotation. Cross references added in the names list appear to be intended to link the two characters that are used for repha in transliteration of Kawi. To do so correctly, the reference to U+A982 JAVANESE SIGN LAYAR needs to be moved from 1B81 to 1B03, and the reference in A982 needs to refer to 1B03 rather than to 1B81.
Date/Time: Sat Feb 27 22:19:11 CST 2021
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: UAX 44: Indic data for Toto
Note: This has been taken care of in the UAX #44 draft.
The proposed update for UAX 44, Unicode character database, has notes: IndicPositionalCategory.txt – Added values for characters in the newly encoded Toto script. IndicSyllabicCategory.txt – Appropriate Indic_Syllabic_Category property values were assigned to characters in the newly encoded Toto script. The two data files this refers to are not available for review yet, but these notes assume that the Toto script has at least some of the characteristics of a Brahmic script that make Indic properties necessary. According to the proposal L2/19-330, that is not the case: It states that "This Toto writing system is not syllable-based and doesn't have an inherent vowel." In addition, the combining class 230 for the U+1E2AE TOTO LETTER RISING TONE would be inappropriate if the script were Brahmic, as combining classes ≠ 0 are in general incompatible with the phonetic character order used for Brahmic scripts.
Date/Time: Mon Mar 1 15:54:54 CST 2021
Name: Lorna Evans
Report Type: Error Report
Opt Subject: Arabic U+089D..U+089F, U+08D0..U+08D2 have wrong property
Note: This has now been corrected in the Alpha data file.
These characters have "ON" in UnicodeData: 089D;ARABIC SUPERSCRIPT ALEF MOKHASSAS;Mn;230;ON;;;;;N;;;;; 089E;ARABIC DOUBLED MADDA;Mn;230;ON;;;;;N;;;;; 089F;ARABIC HALF MADDA OVER MADDA;Mn;230;ON;;;;;N;;;;; and 08D0;ARABIC SUKUN BELOW;Mn;220;ON;;;;;N;;;;; 08D1;ARABIC LARGE CIRCLE BELOW;Mn;220;ON;;;;;N;;;;; 08D2;ARABIC LARGE ROUND DOT INSIDE CIRCLE BELOW;Mn;220;ON;;;;;N;;;;; They should be "NSM". See Unicode proposal: https://www.unicode.org/L2/L2019/19306-quranic-additions.pdf
Date/Time: Mon Mar 1 16:47:35 CST 2021
Name: Erik Carvalhal Miller
Report Type: Public Review Issue
Opt Subject: PRI #428: Comment for U+02B9
The first comment for U+02B9 MODIFIER LETTER PRIME in block Spacing Modifier Letters (unchanged in the 14.0 alpha) says, “primary stress, emphasis”; I recommend either removing the word “primary” or else inserting the phrase “secondary stress”, to better reflect the broad, varied use of the character in marking stress, as the current wording is misleadingly specific. Background & reference: U+02B9ʼs use for primary stress in some dictionaries is undisputed, but L2/20-286 shows excerpts from historical and contemporary dictionaries in which phonetic spellings employ U+02B9 for secondary stress as well. (As reported in L2/21-016 §I.3o, the UTC rejected L2/20-286ʼs proposal to separately encode a prime‐symbol variant that represents primary stress in those excerpts, but the rejection does not impinge on the secondary‐stress use in evidence.)
Date/Time: Tue Mar 2 13:52:43 CST 2021
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #428: decompostion of 107A9
Note: This has now been corrected in the Alpha data file.
107A9 MODIFIER LETTER SMALL R WITH FISHHOOK # <super> 207E Decomposition of 107A9 must read <super> 027E (instead of <super> 207E).
Date/Time: Sun Mar 7 11:09:38 CST 2021
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #428: Headers for 116B9 and for 11740 sqq.
Note: This feedback has been taken into account in updates for the NamesList.txt file.
1. As for 1183B DOGRA ABBREVIATION SIGN, the header above 116B9 TAKRI ABBREVIATION SIGN should be "Punctuation" and not "Sign". 2. Header for 11740..11746 (in the Ahom block) could be "Additional consonants" rather than "Additional consonants for Tai Ahom".
Date/Time: Mon Mar 15 15:25:18 CDT 2021
Name: jennifer daniel
Report Type: Public Review Issue
Opt Subject: Changing the names of two emoji alpha candidates
After getting feedback that was somehow missed last October the ESC recommends we change the names of two emoji alpha candidates: Current Names 1FAC3 MAN WITH SWOLLEN BELLY 1FAC4 PERSON WITH SWOLLEN BELLY Recommendation, Modified 1FAC3 PREGNANT MAN 1FAC4 PREGNANT PERSON Rationale in the link, below. Given that we somehow missed this feedback we didn't want to wait until the next UTC meeting to make this recommendation. https://www.unicode.org/L2/L2021/21055-esc-response-fdbk.pdf Additional background info: https://www.unicode.org/L2/L2021/21056-esc-gender.pdf
Date/Time: Fri Mar 19 19:46:34 CDT 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Inconsistent identifer types for Komi letters
The obsolete Komi letters U+052A..U+052D have Identifier_Type=Obsolete but the other obsolete Komi letters U+0500..U+050F have Identifier_Type=Recommended.
Date/Time: Sun Mar 21 12:17:47 CDT 2021
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #428: Prepended_Concatenation_Mark
Note: This has now been added to the Alpha version of PropList.txt.
U+0890 ARABIC POUND MARK ABOVE and U+0891 ARABIC PIASTRE MARK ABOVE should have Prepended_Concatenation_Mark=True.
Date/Time: Wed Mar 24 16:19:46 CDT 2021
Name: Lorna Evans
Report Type: Error Report
Opt Subject: U+08C8 ArabicShaping name
While I did laugh at this name in ArabicShaping, I think we could come up with a better name: 08C8; KEHEH WITH DOOHICKEY ABOVE; D; GAF It seems that the Arabic Shaping name was never discussed as far as I can tell from script-adhoc notes, nor from UTC minutes. L2/19-077 originally requested the character to be ARABIC LETTER KEHEH WITH HAMZA ABOVE which indicates to me there is some association with a hamza. This was later changed to ARABIC LETTER GRAF in L2/19-252 I would suggest something like this: 08C8; KEHEH WITH EXTENDED HAMZA ABOVE; D; GAF Lorna
Date/Time: Wed Mar 31 15:54:10 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Final round of revision to the codechart anottations, but the second half correspond to the pictograms
The first half corresponds to annotations that I missed the first two rounds, but the second half corresponds to the pictograms. Arabic: 06C5 ARABIC LETTER KIRGHIZ OE: On the second bullet note,instead of reading "a barred form also occurs", it would be better if it read "a glyph variant replaces the looped tail with a horizontal bar through the tail" Arabic Extended-B: 088E ARABIC VERTICAL TAIL: The header above this character should read "Abbreviation mark" instead of "Abbreviation letter" A better phrasing of the bullet note below would be "mark used to indicate abbreviations in moveable type texts from Iran" followed by another note saying: "considered a letter; only attested in final form" Glagolitic: 2C2F GLAGOILITIC LETTER CAUDATE CHRIVI: The bullet note cites the characters it can combine with, but the glyphs with the dotted circle are missing. Furthermore, informative aliases should be added "= cherv, chrivi with tail" Arabic Presentation Forms-A: FDCF ARABIC LIGATURE SALAAMUHU ALAYNAA: Another bullet note could be added stating "used in Christian texts" Kana Extended-B: The initial note states that the system in question is "obsolete", which seems to imply that it was replaced by another system, and it also states that it was used in Taiwan; which is true, but it was also used in a nearby region of mainland China. Ethiopic Supplement: Given the new information of the legacy Gurage orthography the header above 1380 that reads "Syllables for Sebatbeit" should read "Legacy syllables for Gurage orthographies" Followed by a note under this header saying "These characters were originally encoded to represent the Sebatbeit language, but their use extended beyond that language to an entire linguistic region called 'Gurage'; therefore the term 'Sebatbeit' inserted in the character names, should not be interpreted as exclusionary to other languages, but a mere historical artifact. The orthography for the Gurage languages has been updated to use new syllables and these are encoded in the 'Ethiopic Extended-B' block." It's unclear if the header above 2DC0 (in the Ethiopic Extended block) should also be modified accordingly, but the block descriptions in the Spec, should be updated accordingly. Transport and Map Symbols: 1F6DE WHEEL: The informative alias "= tire" could be added 1F6DF LIFE BUOY: The informative alias "= life saver" could be added Geometric Shapes Extended: 1F7F0 BOLD EQUALS SIGN: The addition of this symbol in this block (as opposed to Symbols and Pictographs Extended-A) is dubious. Symbols and Pictographs Extended-A: 1FA74 THONG SANDAL: These informative aliases "= flip flop, chancla" could be added 1FA78 DROP OF BLOOD: Mutual cross references to "1F4A7 💧 droplet" and "1F322 🌢 black droplet" could be added 1FA79 ADHESIVE BANDAGE: The informative alias "= band aid" could be added. 1FA85 PINATA: A bullet note could be added stating "the name is usually spelled with an 'Ñ'(PIÑATA) but Unicode names can only contain ASCII characters" 1FAAA IDENTIFICATION CARD: There should be an informative alias stating "= ID", as well as a bullet note stating "can be used to represent a driver's license or any other form of photo id" 1FAAB LOW BATTERY: There should be a mutual cross reference to "1F50B 🔋 battery" 1FAAC HAMSA: A bullet note could be added stating "can either point up or down". 1FAE6 BITTING LIP: A mutual cross reference to "1F5E2 🗢 lips" could be added 1FAF6 HEART HANDS: There is no need for the rays emanating from the "heart"; leaving them may imply that their inclusion is mandatory, so I recommend removing them from the representative glyph. I would also like to ask, whether or not this character can support different skin tones for each hand, in the future; similar to the HANDSHAKE.
Date/Time: Thu Apr 1 19:17:17 CDT 2021
Name: Eduardo Marín Silva
Report Type: Other Question, Problem, or Feedback
Opt Subject: Request to correct errata in my own piece of feedback of the Unicode 14.0 alpha
My last piece of feedback was accidentally called "Final round of revision to the codechart anottations, but the second half correspond to the pictograms" with the second half added by mistake, so it should instead read "Final round of revision to the codechart annotations" with the corrected spelling of 'annotations' If it's possible, I also noticed that my piece of feedback for the ARABIC VERTICAL TAIL reads "considered a letter; only attested in final form", when it should read "considered a letter, not a presentation form, but only attested in final form" Any other mistakes in my pieces of feedback are minor and so do not need correction.
Date/Time: Sat Apr 3 11:31:51 CDT 2021
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: Error in Egyptian Hieroglyphs file
The Egyptian Hieroglyphs file (U13000.pdf) contains the misspelling “Invertabrata”. The correct spelling (which was also used by Gardiner) is “Invertebrata”.
Date/Time: Sat Apr 10 11:49:32 CDT 2021
Name: r00ster
Report Type: Error Report
Opt Subject: Chinese numerals are not classified as numerals
Hello Unicode, I noticed that you classify Chinese numerals as Lo (other letters) which does not seem very correct to me because I believe Chinese numerals should be classified as numerals and not as other letters. If I go to articles listed on the right of https://en.wikipedia.org/wiki/Numeral_system and try out a few characters listed on these articles, they mostly work (except for some rather outdated scripts such as Tangut numerals) and they are detected by Unicode as numeric, but for Chinese numerals, this is not the case. None of the numerals are detected as numeric. Especially for such a widely spoken language I would expect Unicode to correctly classify the numerals of that language. It is true that in Chinese there is an overwhelmingly large amount of (single) numeral characters, but I believe it is possible to maybe just classify at least the very basic 零/〇、一、二、三、四、五、六、七、八、九 (0-9) as numerals, and leave all other numerals beyond that classified as other letters. Is it possible for you to reclassify them as numerals in a future version? See also: https://github.com/rust-lang/rust/issues/84056. Classifying Chinese numerals as numerals will of course mean support for other East Asian languages too, such as Japanese and Hokkien. Thank you in advance.
Date/Time: Sat Apr 10 18:49:00 CDT 2021
Name: Mikoto Ohtsuki
Report Type: Public Review Issue
Opt Subject: 1B11F-1B122 in Unicode 14.0 Alpha (PRI #428: Unicode 14.0 Alpha Review)
If kana letters proposed at 1B11F-1B122 became candidate for Unicode 14 based on L2/19-381, rationale seem insufficient. AFAIK, they are assumed to be just inventions primarily to fill up empty cells in syllabary chart called gojuonzu (50 sound chart). Usually they appear in some gojuonzu compiled in around late 19th century-early 20th century and lack examples in text actually used to spell words in accordance with proposed characteristics. Existing of YI syllable separate from I syllable, and of WU syllable separate from U syllable has not been attested in history of Japanese phonology or orthography. Therefore it is not possible to happen that native Japanese words such as いもうと, まうす, ようべ in page 6 of L2/19-381, and やいば, ついたち, ちひさい in page 10 were written using kana intended for WU or YI syllable. Note that standard う (U) was used in corresponding hiragana forms of them in page 6 instead of kana intended for WU. Chart contradicts itself. Pages 2 and 7 show 衣 (U+8863) as Kanji Derivation for 1B12D, now shifted to 1B121, KATAKANA LETTER ARCHAIC YE. However 衣 is origin of 1B000 KATAKANA LETTER ARCHAIC E. It is inconsistent evidently. Rather than thinking this kana was derivation from single kanji, thinking it was compound form of イ (I) and エ (E) would be more appropriate as mentioned in footnote. It would be KATAKANA LETTER LIGATURE IE. It is strongly suspected that referenced books were written without scholarly knowledge. Including them with current characteristics in Unicode 14 is questionable. I'd like UTC to consider two matters. First, please postpone inclusion of them to Unicode Standard till their characteristics are confirmed by expert input or examples actually in use with proposed characteristics are provided. If such input is unavailable, please consider another way like encoding them as itaigana (kana variant) for standard I and U kana letters. Second, please reconsider their names. Using same ARCHAIC prefix to both kana dating from Heian era (8th-12th century) and kana invented by `there should be to fill up gojuonzu` attempt in early modern period gives odd feeling. Please don't call latter kana ARCHAIC.
Date/Time: Sun Apr 11 02:28:55 CDT 2021
Name: Patrik Sjöwall
Report Type: Public Review Issue
Opt Subject: Unicode 14.0 Alpha review
I found a few issues with some characters for Unicode 14.0 that seem to have gone unnoticed: 0874 ARABIC LETTER ALEF WITH ATTACHED KASRA 0875 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA 0879 ARABIC LETTER ALEF WITH ATTACHED ROUNDDOT BELOW 087C ARABIC LETTER ALEF WITH RIGHT MIDDLE STROKE AND DOT ABOVE 087D ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND DOT ABOVE 0880 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND LEFT RING These letters reqiure more shaping information. It is not clear how the attached fatha or dot will behave in an obligatory LAM-ALEF ligature. 088E ARABIC VERTICAL TAIL This character is missing in ArabicShaping-14.0.0.txt, but it always joins with the preceding letter. It should be included in that file, either as Right_Joining or be given a new joining type (since it does not change its shape, only causes the character to its right to join), and with either a joining group of its own or No_Joining_Group. 08FB ARABIC DOUBLE RIGHT ARROWHEAD ABOVE 08FC ARABIC DOUBLE RIGHT ARROWHEAD ABOVE WITH DOT The comment "also used in Quranic text in African and otherorthographies to represent dammatan" should come after 08FB, not 08FC. The "right arrowhead" is an angular-shaped damma, and the "dammatan" is a double damma (not a double damma with dot). A7C0 LATIN CAPITAL LETTER OLD POLISH O A7C1 LATIN SMALL LETTER OLD POLISH O This letter should be named "O ROGATE", the name "commonly used among specialists" according to the proposal. Then a comment below could say "used for nasal vowel in Old Polish". The current name sounds like this was a letter used instead of "O" in Old Polish, which is not the case. A7D3 LATIN SMALL LETTER DOUBLE THORN A7D5 LATIN SMALL LETTER DOUBLE WYNN These two small letters are added to the standard without matching capitals. That is incosistent with how other comparable letters are encoded. Letters used in a casing orthography are almost always encoded as casing pairs, even if they do not appear in the beginning of a word and the capital leter thus only appears in ALL-CAPS TEXT. As far as I know at least the following capitals were encoded without being needed outside all-caps: 0184 LATIN CAPITAL LETTER TONE SIX 01A6 LATIN LETTER YR 01A7 LATIN CAPITAL LETTER TONE TWO 01BC LATIN CAPITAL LETTER TONE FIVE 0220 LATIN CAPITAL LETTER N WITH LONG RIGHT LEG 037F GREEK CAPITAL LETTER YOT 042A CYRILLIC CAPITAL LETTER HARD SIGN 042C CYRILLIC CAPITAL LETTER SOFT SIGN 1E9E LATIN CAPITAL LETTER SHARP S 2C1F GLAGOLITIC CAPITAL LETTER YERU 2C20 GLAGOLITIC CAPITAL LETTER YERI It is possible that one or two have been used word-initially in languages that were not supported when they were added. On the other hand, it is also quite likely that there are more encoded capitals that never occur in the beginning of a word. Apart from that (and issues already addressed by others) everything looks fine so far. Best regards! /Patrik Sjöwall
Date/Time: Sun Apr 11 05:17:33 CDT 2021
Name: Wang Yifan
Report Type: Public Review Issue
Opt Subject: PRI #428: comments on U+1F7F0 and U+1F979
On U+1F7F0: Might be good to have a cross-reference to U+3013 GETA MARK for pure graphic resemblance, and vice versa. On U+1F9F9: The current glyph of FACE HOLDING BACK TEARS does not sufficiently distinguish it from U+1F9FA FACE WITH PLEADING EYES. A quick suggestion that I think effective is to paint tears white (non-hatched) and use a dumbbell-shaped mouth. In the light of the original proposal, this character is intended to include the Samsung emoji depicted in the page 1 of this document. http://www.unicode.org/L2/L2020/20064-face-holding-back-tears.pdf Here, the dumbbell-shaped mouth is a key feature characterizes the emoticon being a stylized depiction of the lip-biting expression in the East Asian graphical convention. It is different from both upward (pouting) and downward (neutral-smiling) curled mouth. This type of expression is also seen in most of the actual examples cited in the page 5 of the proposal, thus should not be left out. Meanwhile, there is U+1F9FA that usually implemented with similarly watery eyes. (See https://emojipedia.org/pleading-face/) Even though not reflected in the current code chart, such designs should be interpreted as the inherent semantics in the original proposal (as FACE WITH GLISTENING EYES; https://www.unicode.org/L2/L2017/17244r-emoji-faces-v11.pdf) instead of mere vendors' discretion, and should be respected as such. The alpha glyph of U+1F9F9 has a rather intricate design of eyes that makes it hard to tell tears apart from eyeballs in black-and-white printing. The tears should be graphically more distinctively separated from its background in order to avoid misinterpretation that it has exactly same kind of eyes the existing glyphs of U+1F9FA have. (Optimally, U+1F9FA should be also updated to have more upward-looking eyes and downward-sloping eyebrows in the code chart.) Last year, U+1F9FA was "the third most used emoji on Twitter" according to Emojipedia, and awarded "Neologism of the Year 2020" in Japan. Special care should be taken to avoid possible confusion by existing users. https://blog.emojipedia.org/a-new-king-pleading-face/ https://ja.wikipedia.org/wiki/%E3%81%B4%E3%81%88%E3%82%93
Date/Time: Mon Apr 12 16:47:19 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Combining Diacritical Marks Extended
Move COMBINING DOUBLE PLUS SIGN ABOVE and COMBINING DOUBLE PLUS SIGN BELOW to immediately after COMBINING PLUS SIGN ABOVE.
Date/Time: Mon Apr 12 17:58:47 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Encode COMBINING OVERCURL at 1ACF
COMBINING OVERCURL was first proposed on 2017-09-27 in L2/17-342 (N4902). A revised proposal was published on 2017-10-17 in L2/17-358 (N4907). An attempt was made to ballot not a single combining character but as a number of atomic letters. The argument for atomic characters was basically "It might be hard to implement in Noto fonts", which I consider to be pretty ridiculous. Irish ballot comments on 2018-12-20 refuted this, saying that the OVERCURL is definitely an abbreviation mark, not a basic orthographic letter, so it would be both inappropriate and impractical to have to work with atomic characters when all of the other related marks used in medieval palaeography (COMBINING OVERLINE, COMBINING INVERTED BREVE, COMBINING FERMATA) are treated as the diacritical marks they are. The atomic characters were taken off the ballot evidently because US ballot comments now said that the COMBINING OVERCURL was nothing more than a glyph variant of COMBINING INVERTED BREVE. No justification for this assertion was made. The COMBINING OVERCURL was balloted again at 1DFA. Irish ballot comments on 2019-05-06 reaffirmed Ireland's support for the combining character, but again it was taken off the ballot. In that document, Irish ballot comments contained a draft UTN describing the rules for drawing glyphs with a combining overcurl. (The basic rule is "The OVERCURL simply has to attach at a convenient point, and swing over towards the left." This is not something that any competent font designer would fail at doing. The proposal documents clearly described the use of these kinds of marks. COMBINING OVERLINE and COMBINING INVERTED BREVE are typically used to indicate an -m or an -n following the previous letter. COMBINING OVERCURL is used to indicate an -m or an -n, but often it is meaningless, and in Middle Scots when over an s it means "shilling" (L2/20-267 (N5144)); in Middle English when following an r it may mean -e or it may mean nothing. The COMBINING INVERTED BREVE does not have this polyvalence. COMBINING OVERCURL is not a glyph variant of either COMBINING OVERLINE or COMBINING INVERTED BREVE. The UCS contains in Latin Extended-B twelve characters used in South Slavic poetics (Ȃ ȃ Ȇ ȇ Ȋ ȋ Ȏ ȏ Ȓ ȓ Ȗ ȗ) and certainly no one would ever consider an overcurl glyph-variant on these letters to be acceptable. The suggestion that the US NB has made, that it must be proved that COMBINING OVERCURL isn't a glyph variant of COMBINING INVERTED BREVE is based on nothing but a casual assumption. But the proposal documents show that the OVERCURL can mean -m, -n, -e, Ø or be a complete abbreviation (as in shilling), and the INVERTED BREVE can only mean (and always does mean in medieval British palaeography) -m and -n, or a tone contour in South Slavic. I prepared but have not yet published a document showing how the existing COMBINING ZIGZAG above is a free-floating diacritical mark in continental Europe, but in Britain grows a tail and attaches to the base letter. We do not need a "combining attaching zigzag" and with regard to the overcurl I showed that if we simply take a half-arc and rotate it 45 degrees over a dotted circle, and if a font were to implement that without fusing the OVERCURL to the base letter, it would remain legible, and indeed in my own work my monowidth font does not have fused forms while my publication fonts do. (The OVERCURL is still much bigger than the INVERTED BREVE.) The Script Ad Hoc has seen this draft and accepts that the fusion aspect of rendering is not a real problem. In 2020 Volume I of Corpus Textuum Cornicorum "The Charter Fragment and Pascon agan Arluth" was published, making use of the COMBINING OVERCURL in both transcriptions and as a combining character in descriptions of the abbreviations used. The datafiles cannot be published because they contain one Private-Use character, and that defeats the purpose of plain-text encoding. The suggestion that OVERCURL is a stylistic glyph variant of INVERTED BREVE means that the data regarding which BREVES are to duse and which are not would be left to a higher level protocol, which is inappropriate because of the semantics (again m/n on the one hand and m/n/e/Ø/etc on the other). To summarize: 1) INVERTED BREVE and OVERCURL do not have the same semantics. 2) INVERTED BREVE and OVERCURL do not have the same shapes. 3) A draft UTN outlining the rendering issues exists. It gives clear and simple advice to any typographer. 4) Unfused forms using a large tilted OVERCURL which happens not to fuse are legible and preserve in plain text the distinction required. This is analogous to the Continental and British variation of the COMBINING ZIGZAG. 5) Palaeographic readings of the two earliest MIddle Cornish have been published and more texts are being prepared which will also distinguish OVERCURL and INVERTED BREVE. 6) Palaeographic readings of the New Testament in Middle Scots are being prepared and this text too distinguishes OVERCURL and INVERTED BREVE. 7) The OVERCURL form is not an acceptable glyph variant for South Slavic poetics, and cannot be applied to 0203, 0204, 0206, 0207, 020A, 020B, 020E, 020F, 0212, 0213, 0216, or 0217. In Middle Scots, however, the COMBINING OVERCURL occurs over vowels even word-internally, as in the word ȋto 'into' (I had to use the inverted breve because the overcurl isn't encoded) and it is definitely NOT an unattached breve. Please encode COMBINING OVERCURL at 1ACF. Further delay is of no benefit to anybody.
Date/Time: Mon Apr 12 18:08:02 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Currency Symbols
Like the EURO SIGN and other characters, the SOM SIGN U+20C0 should be shown in a Times-like font.
Date/Time: Mon Apr 12 18:09:37 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Punctuation
The barred square brackets from 2E56..2E58 should be drawn on the same basis as other square brackets in the code charts.
Date/Time: Mon Apr 12 18:12:06 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Glagolitic
The glyphs fr the two new characters must be improved.
Date/Time: Mon Apr 12 18:21:57 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Symbols and Pictographs
Something is wrong with the glyphs for 1F979 and 1F97A. The face shown at 1F979 looks just like the glyph for 1F97A in the macOS and iOS Apple Color Emoji UI font. Thanks for keeping my TROLL glyph.
Date/Time: Mon Apr 12 18:32:17 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Symbols and Pictographs Extended-A
We needed a MIRROR BALL? Saints preserve us. Nests both with and without eggs. I'm sure I need a STEGOSAURUS and a TRICERATOPS much more. The glyph for 1FAE1 is pretty illegible. The glyph for 1FAE2 must have a direction error. I can't imagine what a DOTTED LINE FACE is intended to represent. Invisibility? It is a TERRIBLE name. I suppose the new Hand symbols are welcome but I think we still have the problem of the thumbs-up and thumbs-down emojis not being what the viewer would actually see if he were looking at his own hand. Try it. It is still an appallingly US-centric oversight that 1F594 🖔 REVERSE VICTORY HAND has not been emojified. This is used everywhere in Britain and Ireland. It is a weaker form of 🖕 REVERSED HAND WITH MIDDLE FINGER EXTENDED. This has been mentioned (and ignored) before. But now we get a MIRROR BALL and two kinds of nest.
Date/Time: Wed Apr 14 12:31:43 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Alpha Review: CJK Unified Ideograph Extension B
In this proposal (https://www.unicode.org/L2/L2018/18063-remove-ucs2003-ext-b.pdf) the removal of the UCS2003 glyphs from the codechart was proposed (this proposal was accepted by the IRG). However the current version 14 alpha charts still maintains them. Removal of the glyphs would allow to fit four columns of width 2 rather per page, than the current 3 columns that are 3 wide. This in turn would substantially reduce the number of pages of the codechart, reducing the memory strain caused by trying to consult the charts. Remaking the codechart is far from a trivial task, however I mention this to get some sort of update on the issue (given the time since it has been introduced).
Date/Time: Wed Apr 14 17:11:14 CDT 2021
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+52E4
This character appears in group 574 on p.85 of Casey. The field is missing in the database and needs to be added.
Date/Time: Fri Apr 16 09:40:46 CDT 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Soft_Dotted property of U+1DF1A
Proposed character U+1DF1A LATIN SMALL LETTER I WITH STROKE AND RETROFLEX HOOK should have the Soft_Dotted property like other variants of the letter i.
Date/Time: Fri Apr 16 17:24:19 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Latin Extended-D
Please fill the empty spaces at A7D2 and A7D4 with the characters those spaces have been left for, LATIN CAPITAL LETTER DOUBLE THORN and LATIN CAPITAL LETTER DOUBLE WYNN respectively. These characters come from the Ormulum, an important and very long Middle English text for which the author, Orm, devised an orthography which marked short vowel length regularly by doubling letters after the vowel (as in "menn" 'men' and "wiþþ" 'with'). Orm's orthography also marked this by superscripting a letter (as in "menᷠ" 'men') but where a short vowel preceded -þ or -w (-ƿ), the bowl of the thorn and the wynn were doubled. Orm knew very well what a capital letter was and he was scrupulous in using them. The addition of TIRONIAN SIGN CAPITAL ET to the UCS was in part based on the evidence from Orm's text. Double Thorn and Double Wynn would not begin a word or sentence because the orthography uses the double characters after vowels, but if Orm (or a modern editor, like me, who am preparing a palaeographic reading of The Ormulum) wanted to write a word in ALL CAPS or in ꜱᴍᴀʟʟ ᴄᴀᴘɪᴛᴀʟꜱ, he (and I) would certainly know to do so. This argument has been put forward many times for letters used in natural orthographies (indeed they were put forward for other characters in Latin Extended-D. The UTC has not explained to me why they have left the blanks. If they are waiting to find out if Orm ever wrote the word WIÞÞ 'WITH' in all caps, well, I do not have the answer, because the text is 30,000 lines long. But Orm is dead, and Orm is not trying to use the Unicode Standard. I and other scholars who work with the Ormulum have a reasonable expectation that its characters should behave as normal. An editor who wishes to write a vocabulary and use ALL CAPS in the headwords should be able to do so. An editor who submits an article title with "wiþþ" in it (with a double thorn) to a journal that puts the article titles in small caps in the header will expect normal casing behaviour. Casing behaviour is a natural function of the Latin script. Please fill the empty spaces at A7D2 and A7D4 with the characters those spaces have been left for, LATIN CAPITAL LETTER DOUBLE THORN and LATIN CAPITAL LETTER DOUBLE WYNN respectively.
Date/Time: Fri Apr 16 17:37:02 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Symbols and Pictographs Extended-A
1FAF1 RIGHTWARD BACKHAND and 1FAF2 LEFTWARD HAND are misnamed. "Backhand" refers to a kind of tennis swing; it does not refer to the back of a hand. Handedness is something the UCS should have dealt with long ago. RIGHT-POINTING BACK OF HAND is what the first one is, and LEFT-POINTING FRONT OF HAND is what the other one is. All of the existing hands should be looked at with regard to this. Note that the THUMBS UP and THUMBS DOWN hands are not completely encoded. Users should be able to select whether they wish to show hands with thumbs up or down based on how the would see it if they were holding their hands out in front of them. When I look at my right hand thumbs up I see the palm. When I look at ny right hand thumbs down, I see the back. This is Alpha, so if there is a wish to make some of these hands make sense, now is the time to complete the set logically. I would help between now and beta if asked.
Date/Time: Fri Apr 16 18:20:45 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Latin Extended-D
Patrik Sjöwall has suggested that OLD POLISH O should be named O ROGATE. I do not find the word ROGATE /roʊɡeɪt/ in the Oxford English Dictionary. I am sorry to disagree with him, but in the absence of knowing what "rogate" means I can't recommend this, and its non-appearance in the OED—well, it means that even I don't know how to find out what a "rogate O" is. "Rogare" means 'to ask' in Latin. OLD POLISH O means it is a kind of O found in Old Polish, not that it is /o/ in Old Polish.
Date/Time: Fri Apr 16 18:27:01 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Symbols and Pictographs Extended-A
1FAF3 should be PALM FACING UPWARDS 1FAF4 should be PALM FACING DOWNWARDS 1FAF5 should be UNCLE SAM HAND (well, okay) or HAND WITH INDEX FINGER POINTING FORWARD POINTING AT THE VIEWER should at least be POINTING TOWARDS VIEWER, if the viewer has to be taken into account.