Public Review Issues

Accumulated Feedback on PRI #412

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Wed Oct 23 08:23:55 CDT 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Description of U+11D97 GUNJALA GONDI VIRAMA in TUS

The section on Gunjala Gondi in chapter 13 says “The script uses a virama to
create conjuncts, but it does not suppress the inherent vowel”, i.e. it only
appears within conjuncts. Is that true? Figure 10 of L2/15-235R shows
word-final half-forms, not part of conjuncts, corresponding to Telugu
consonants with viramas.

Date/Time: Mon Nov 4 13:50:00 CST 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Script_Extensions of U+0589 ARMENIAN FULL STOP

Chapter 7 says “Prior to Version 6.0 the Unicode Standard recommended the
use of U+0589 ARMENIAN FULL STOP as the two dot version of the full stop for
historic Georgian documents. This is no longer recommended”. However, U+0589
still has Script_Extensions={Armn Geor}. So should it be used in Georgian or
not?

Date/Time: Mon Nov 4 14:06:05 CST 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Mathematical alphanumeric soft-dotted characters

To keep a dot on a soft-dotted character when it would normally be
suppressed, chapter 7 recommends adding an “overdot”. (“Overdot” isn’t
explicitly defined, but it probably means U+0307 COMBINING DOT ABOVE.)
Mathematical alphanumeric symbols i and j are soft-dotted, but in some
mathematical alphanumeric styles, the dot does not look like a default
U+0307. Should the overdot change its style if it is the first ccc=230 mark
on a mathematical alphanumeric base?

Date/Time: Tue Nov 19 12:32:05 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Inclusion of Symbols for Legacy Computing in Extended_Pictographic

The Extended_Pictographic property currently includes the entirety of the
Symbols for Legacy Computing block (U+1FB00–U+1FBFF). However, the vast
majority of the characters in that block cannot be described as pictographic
and are never going to be useful as emoji; they are mostly block elements
and box drawings. I propose excluding that range from Extended_Pictographic.

Date/Time: Tue Nov 19 13:02:44 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Line break behaviour of Khitan Small Script

The characters of the Khitan Small Script have been given the Line_Break
property value Ideographic (ID). This would allow line breaks to occur
between any pair of KSS characters. However, several proposal documents
expressed preference for the opposite behaviour:

• L2/16‑245R (https://www.unicode.org/L2/L2016/16245r-n4738r2-khitan-small.pdf)
• L2/18‑121R (https://www.unicode.org/L2/L2018/18121r-n4943-khitan-cluster.pdf)

Any sequence of KSS characters without intervening spaces forms a special
cluster that should stay together on one line, so a property value like
Alphabetic would be more appropriate, even if the script appears similar to
other ideographic writing systems.

Date/Time: Sun Dec 8 12:36:41 CST 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Bad Javanese BNF

The Javanese syllable BNF is {C F} C {{R}Y} {V{A}} {Z}. That means that -ra 
may only occur if -ya occurs, and -aa may only occur if another vowel sign 
occurs. Both restrictions are wrong.

Date/Time: Wed Dec 18 14:49:11 CST 2019
Name: Dr. Wang, Kai
Report Type: Error Report
Opt Subject: U+2F46 (from "Kangxi-Radicals")

Hallo everyone,

I've found an error in "Kangxi Radicals", to the radical with the code 
"2f46". It should be 旡 (jì). It looks like a kneeling figure, and means 
hiccup. It's used to build the character 既 for example. But it's a other 
character as 无 (wú) that is described there, which was no radical before 
the simplification in 1960 years and means "nothing" (traditional = 無).

Date/Time: Sun Dec 22 15:35:25 CST 2019
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Line break behaviour of Symbols for Legacy Computing

Currently, almost all characters in the new Symbols for Legacy Computing
block belong to the Line_Break property value Ideographic (ID). This is
inconsistent with existing characters of a comparable nature.

The majority of characters in the Block Elements and Box Drawings blocks are
Line_Break=Ambiguous, while some are Line_Break=Alphabetic. Many of the
pictographic and geometric symbols in the BMP are similarly Alphabetic.
Users would expect the new characters to have the same properties as these
older sets, even if most blocks in that section of the SMP default to
Line_Break=Ideographic.

Because the behaviour of Ambiguous characters depends on the resolution of
their East_Asian_Width value and (to my knowledge) none of the characters in
Symbols for Legacy Computing derive from eastern character sets, I propose
simply assigning the entire range U+1FB00..U+1FBCA to Line_Break=Alphabetic.

This would also enable a more refined line‐breaking behaviour for the
following multi‐part glyphs:

	U+1FBB2 🮲 LEFT HALF RUNNING MAN
	U+1FBB3 🮳 RIGHT HALF RUNNING MAN

	U+1FBB9 🮹 LEFT HALF FOLDER
	U+1FBBA 🮺 RIGHT HALF FOLDER

	U+1FBC1 🯁 LEFT THIRD WHITE RIGHT POINTING INDEX
	U+1FBC2 🯂 MIDDLE THIRD WHITE RIGHT POINTING INDEX
	U+1FBC3 🯃 RIGHT THIRD WHITE RIGHT POINTING INDEX

In regular usage, these three groups of characters would always occur in
sequence to form a larger glyph. Line_Break=Alphabetic for these characters
ensures that a line break normally could not occur in the middle of these
large glyphs which is more intuitive to users, even if that behaviour may
not have existed on their original platforms. Otherwise, users would always
need to insert WORD JOINERs between these characters to ensure the symbols
do not get broken up.

Date/Time: Fri Jan 3 08:15:06 CST 2020
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #412: Bidi Class of Multi‐Part Symbols for Legacy Computing

The following new characters have been assigned the Bidi_Class property
value Other_Neutral (ON):

U+1FBB2 🮲 LEFT HALF RUNNING MAN
U+1FBB3 🮳 RIGHT HALF RUNNING MAN

U+1FBB9 🮹 LEFT HALF FOLDER
U+1FBBA 🮺 RIGHT HALF FOLDER

U+1FBC1 🯁 LEFT THIRD WHITE RIGHT POINTING INDEX
U+1FBC2 🯂 MIDDLE THIRD WHITE RIGHT POINTING INDEX
U+1FBC3 🯃 RIGHT THIRD WHITE RIGHT POINTING INDEX

I propose changing that value to Left_To_Right (L). While these characters
can of course be used in any order and combination, their real use lies in
forming larger glyphs. In an LTR context (which these characters were
originally designed for), writing them out in the intended order will
produce proper glyphs (🮲🮳, 🮹🮺, 🯁🯂🯃). However, in an RTL context the logical
order of these sequences would need to be reversed lest they turn into
meaningless scribbles (🮳🮲, 🮺🮹, 🯃🯂🯁).

This would cause problems when copy‐pasting these symbols between text runs
of differing directionality, as the LTR and RTL versions of the exact same
glyphs would have a completely different memory representation.

Programs like screenreaders could also have trouble with this if they expect
the sequences to always have a certain order (i.e. left half first, right
half second) and would not recognise the reverse sequence as representing
the same concept. A screenreader could announce the sequence <U+1FBB2,
U+1FBB3> as just “Running Man” without enumerating its constituent parts;
this would be much harder to implement if the screenreader had to take text
direction into account to determine whether the sequence is (for lack of a
better word) well‐formed.

In some sense, these large glyphs are very similar to combining character
sequences whose memory representation remains the same regardless of text
direction. Giving these characters strong directionality would ensure that
they can always be consistently written in the same order. The Regional
Indicator Symbols were given the property value Left_To_Right for a similar
reason, because their precise visual order must always stay the same
regardless of context.