Public Review Issues

Accumulated Feedback on PRI #318

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Tue Mar 1 09:45:40 CST 2016
Name: Ryusei Yamaguchi
Report Type: Public Review Issue
Opt Subject: Review for PRI #318 Proposed Update UAX #11, East Asian Width

1. The propose notes that 774 neutral characters and 25 ambiguous characters
would be changed to wide characters. How is it going on the regulation?

Ideally the characters unified with non-East Asian legacy characters should be
ambiguous in the current rule. For example, U+263A WHITE SMILING FACE is not
on the list of characters changed to wide. Should we change WHITE SMILING FACE
to ambiguous? I think we shouldn't, but the current rule may say so.

2. The ambiguous characters are problematic. Character width have ambiguity as
same as the emoji style. We introduced variation selectors for characters
which we can't determine they are emoji or text. Heuristic solutions are often
uncomfortable. Why don't we introduce VS for character width? Basically
terminal emulators should respect the character width of the font they use,
and use variation selector to instruct the emulator in the width of it. It
would be the best strategy.

3. How are more-than-1-em characters? There are 2-em dash or 3-em dash. It may
be out of scope on the UAX, but should be considered. Ambiguity of character
width is general problem rather than East Asian.

Date/Time: Sat Mar 19 10:43:51 CDT 2016
Name: Ken Lunde (Editor, UAX #11)
Report Type: Public Review Issue
Opt Subject: Reply to Ryusei Yamaguchi's 2016-03-01 "Review for PRI #318 Proposed Update UAX #11, East Asian Width"

Yamaguchi-san,

Thank you for submitting feedback for PRI #318:

http://www.unicode.org/review/pri318/

Your comments will be discussed during UTC #147 in May, but as the editor of
this UAX, I felt that you deserved a response prior to the UTC's discussion.
Note that my comments below will be added to this PRI, and will also be
discussed at UTC #147.

> 1. The propose notes that 774 neutral characters and 25 ambiguous
> characters would be changed to wide characters. How is it going on the
> regulation? Ideally the characters unified with non-East Asian legacy
> characters should be ambiguous in the current rule. For example, U+263A
> WHITE SMILING FACE is not on the list of characters changed to wide.

> Should we change WHITE SMILING FACE to ambiguous? I think we shouldn't, but
> the current rule may say so. U+263A is currently set to N (East Asian
> Neutral), and nothing in PRI #318 suggests that it be changed to A (East Asian
> Ambiguous).

The added recommendation to treat "emoji style" standardized variation
sequences as though they were set to W would thus treat <263A FE0F> as East
Asian Wide, which covers this and other similar cases.

> 2. The ambiguous characters are problematic. Character width have
> ambiguity as same as the emoji style. We introduced variation selectors
> for characters which we can't determine they are emoji or text.
> Heuristic solutions are often uncomfortable. Why don't we introduce VS
> for character width? Basically terminal emulators should respect the
> character width of the font they use, and use variation selector to
> instruct the emulator in the width of it. It would be the best strategy.

The characters that correspond to A (East Asian Ambiguous) are problematic in
other ways beyond the scope of UAX #11, and it is up to implementations to
resolve the ambiguity in their own way. I once proposed (in L2/14-006) the use
of Standardized Variants to distinguish between Western and CJK use for a
small set of characters, which seems somewhat related to what you are
proposing, but the UTC rejected it. See:

http://www.unicode.org/L2/L2014/14006-sv-western-vs-cjk.pdf

Keep in mind that the primary purpose of UAX #11 is to guide developers toward
a solution that eventually resolves a character's width into one of two
possibilities: half-width or full-width. When a character is A, there is a
good chance that it will resolve to W because it is generally treated as W in
an East Asian context, but some circumstances may suggest that it resolve to H
(some terminals, such as the Terminal app of OS X, have an option to force
East Asian Ambiguous characters to be treated as East Asian Wide). The same is
true of characters that are N, but the tendency is for such characters to
resolve to H, but some conditions, such as "emoji style," may cause them to be
treated as W.

> 3. How are more-than-1-em characters? There are 2-em dash or 3-em dash.
> It may be out of scope on the UAX, but should be considered. Ambiguity of
> character width is general problem rather than East Asian.

Such characters are simply out of the scope of UAX #11, and also out of scope
of fixed-width column implementations, such as terminals.

For implementations that do not need to force characters into one of two
possible widths, UAX #11 serves no purpose, and instead the advance widths of
the glyphs for each character, as specified in the selected font, should be
used.

Best...

-- Ken