This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Wed Oct 23 08:23:55 CDT 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Description of U+11D97 GUNJALA GONDI VIRAMA in TUS
The section on Gunjala Gondi in chapter 13 says “The script uses a virama to create conjuncts, but it does not suppress the inherent vowel”, i.e. it only appears within conjuncts. Is that true? Figure 10 of L2/15-235R shows word-final half-forms, not part of conjuncts, corresponding to Telugu consonants with viramas.
Date/Time: Mon Nov 4 13:50:00 CST 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Script_Extensions of U+0589 ARMENIAN FULL STOP
Chapter 7 says “Prior to Version 6.0 the Unicode Standard recommended the use of U+0589 ARMENIAN FULL STOP as the two dot version of the full stop for historic Georgian documents. This is no longer recommended”. However, U+0589 still has Script_Extensions={Armn Geor}. So should it be used in Georgian or not?
Date/Time: Mon Nov 4 14:06:05 CST 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Mathematical alphanumeric soft-dotted characters
To keep a dot on a soft-dotted character when it would normally be suppressed, chapter 7 recommends adding an “overdot”. (“Overdot” isn’t explicitly defined, but it probably means U+0307 COMBINING DOT ABOVE.) Mathematical alphanumeric symbols i and j are soft-dotted, but in some mathematical alphanumeric styles, the dot does not look like a default U+0307. Should the overdot change its style if it is the first ccc=230 mark on a mathematical alphanumeric base?
Date/Time: Tue Nov 19 12:32:05 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Inclusion of Symbols for Legacy Computing in Extended_Pictographic
The Extended_Pictographic property currently includes the entirety of the Symbols for Legacy Computing block (U+1FB00–U+1FBFF). However, the vast majority of the characters in that block cannot be described as pictographic and are never going to be useful as emoji; they are mostly block elements and box drawings. I propose excluding that range from Extended_Pictographic.
Date/Time: Tue Nov 19 13:02:44 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Line break behaviour of Khitan Small Script
The characters of the Khitan Small Script have been given the Line_Break property value Ideographic (ID). This would allow line breaks to occur between any pair of KSS characters. However, several proposal documents expressed preference for the opposite behaviour: • L2/16‑245R (https://www.unicode.org/L2/L2016/16245r-n4738r2-khitan-small.pdf) • L2/18‑121R (https://www.unicode.org/L2/L2018/18121r-n4943-khitan-cluster.pdf) Any sequence of KSS characters without intervening spaces forms a special cluster that should stay together on one line, so a property value like Alphabetic would be more appropriate, even if the script appears similar to other ideographic writing systems.
Date/Time: Sun Dec 8 12:36:41 CST 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Bad Javanese BNF
The Javanese syllable BNF is {C F} C {{R}Y} {V{A}} {Z}. That means that -ra may only occur if -ya occurs, and -aa may only occur if another vowel sign occurs. Both restrictions are wrong.
Date/Time: Wed Dec 18 14:49:11 CST 2019
Name: Dr. Wang, Kai
Report Type: Error Report
Opt Subject: U+2F46 (from "Kangxi-Radicals")
Hallo everyone, I've found an error in "Kangxi Radicals", to the radical with the code "2f46". It should be 旡 (jì). It looks like a kneeling figure, and means hiccup. It's used to build the character 既 for example. But it's a other character as 无 (wú) that is described there, which was no radical before the simplification in 1960 years and means "nothing" (traditional = 無).
Date/Time: Sun Dec 22 15:35:25 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Line break behaviour of Symbols for Legacy Computing
Currently, almost all characters in the new Symbols for Legacy Computing block belong to the Line_Break property value Ideographic (ID). This is inconsistent with existing characters of a comparable nature. The majority of characters in the Block Elements and Box Drawings blocks are Line_Break=Ambiguous, while some are Line_Break=Alphabetic. Many of the pictographic and geometric symbols in the BMP are similarly Alphabetic. Users would expect the new characters to have the same properties as these older sets, even if most blocks in that section of the SMP default to Line_Break=Ideographic. Because the behaviour of Ambiguous characters depends on the resolution of their East_Asian_Width value and (to my knowledge) none of the characters in Symbols for Legacy Computing derive from eastern character sets, I propose simply assigning the entire range U+1FB00..U+1FBCA to Line_Break=Alphabetic. This would also enable a more refined line‐breaking behaviour for the following multi‐part glyphs: U+1FBB2 🮲 LEFT HALF RUNNING MAN U+1FBB3 🮳 RIGHT HALF RUNNING MAN U+1FBB9 🮹 LEFT HALF FOLDER U+1FBBA 🮺 RIGHT HALF FOLDER U+1FBC1 🯁 LEFT THIRD WHITE RIGHT POINTING INDEX U+1FBC2 🯂 MIDDLE THIRD WHITE RIGHT POINTING INDEX U+1FBC3 🯃 RIGHT THIRD WHITE RIGHT POINTING INDEX In regular usage, these three groups of characters would always occur in sequence to form a larger glyph. Line_Break=Alphabetic for these characters ensures that a line break normally could not occur in the middle of these large glyphs which is more intuitive to users, even if that behaviour may not have existed on their original platforms. Otherwise, users would always need to insert WORD JOINERs between these characters to ensure the symbols do not get broken up.
Date/Time: Fri Jan 3 08:15:06 CST 2020
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Bidi Class of Multi‐Part Symbols for Legacy Computing
The following new characters have been assigned the Bidi_Class property value Other_Neutral (ON): U+1FBB2 🮲 LEFT HALF RUNNING MAN U+1FBB3 🮳 RIGHT HALF RUNNING MAN U+1FBB9 🮹 LEFT HALF FOLDER U+1FBBA 🮺 RIGHT HALF FOLDER U+1FBC1 🯁 LEFT THIRD WHITE RIGHT POINTING INDEX U+1FBC2 🯂 MIDDLE THIRD WHITE RIGHT POINTING INDEX U+1FBC3 🯃 RIGHT THIRD WHITE RIGHT POINTING INDEX I propose changing that value to Left_To_Right (L). While these characters can of course be used in any order and combination, their real use lies in forming larger glyphs. In an LTR context (which these characters were originally designed for), writing them out in the intended order will produce proper glyphs (🮲🮳, 🮹🮺, 🯁🯂🯃). However, in an RTL context the logical order of these sequences would need to be reversed lest they turn into meaningless scribbles (🮳🮲, 🮺🮹, 🯃🯂🯁). This would cause problems when copy‐pasting these symbols between text runs of differing directionality, as the LTR and RTL versions of the exact same glyphs would have a completely different memory representation. Programs like screenreaders could also have trouble with this if they expect the sequences to always have a certain order (i.e. left half first, right half second) and would not recognise the reverse sequence as representing the same concept. A screenreader could announce the sequence <U+1FBB2, U+1FBB3> as just “Running Man” without enumerating its constituent parts; this would be much harder to implement if the screenreader had to take text direction into account to determine whether the sequence is (for lack of a better word) well‐formed. In some sense, these large glyphs are very similar to combining character sequences whose memory representation remains the same regardless of text direction. Giving these characters strong directionality would ensure that they can always be consistently written in the same order. The Regional Indicator Symbols were given the property value Left_To_Right for a similar reason, because their precise visual order must always stay the same regardless of context.