L2/15-277

Comments on Public Review Issues
(July 21, 2015 - Oct 31, 2015)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 31, 2015, since the previous cumulative document was issued prior to UTC #144 (July 2015). Grayed-out items in the Table of Contents do not have feedback here.

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of October 31, 2015. Gray rows have no feedback to date.

Issue Name Feedback Link
309 Proposed Update UTR #50, Unicode Vertical Text Layout (feedback
308 Property Change for U+202F NARROW NO-BREAK SPACE (NNBSP) (feedback
307 Proposed Update UAX #38, Unicode Character Database (feedback
306 Proposed Update UAX #29, Unicode Text Segmentation (feedback
305 Proposed Update UAX #44, Unicode Character Database (feedback
304 Proposed Update UAX #24, Unicode Script Property (feedback
303 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax (feedback
302 Feedback on Draft additional repertoire for ISO/IEC 10646:2016 (5th edition) (feedback
301 Feedback on Additional repertoire for Amendment 2 (DAM2) to ISO/IEC 10646:2014 (4th edition) (feedback
300 Proposed Update UTR #51, Unicode Emoji (feedback

The links below go to locations in this document for feedback.

Feedback on Encoding Proposals
Error Reports
Other Reports

 


Feedback on Encoding Proposals

Date/Time: Thu Aug 20 12:14:25 CDT 2015
Name: Ken Lunde
Report Type: Feedback on an Encoding Proposal
Opt Subject: UTC feedback to Japan about L2/15-193 (aka WG2 N4670)

For the record, I submitted to Japan via their official feedback form the
following UTC feedback on their Hentaigana encoding proposal (L2/15-193, which
is also WG2 N4670):

次のコメントは英語で書いてあること、すみません。

The UTC reviewed and discussed WG2 N4670 (aka L2/15-193), "Request for
Comments on HENTAIGANA proposal," during the UTC #144 meeting which took place
in Redmond, WA, USA from 2015-07-27 through 2015-07-31. The actual discussions
took place on 2015-07-31. The UTC would like to thank Japan for making the
effort to submit this proposal, which provided the UTC an opportunity to
review it, and to provide the constructive feedback below.

The UTC recommends that all 299 (or 298*) characters in this proposal be
encoded in the "Kana Supplement" block (U+1B000 through U+1B0FF**). The UTC
further recommends that a new block, tentatively named "Kana Supplement-A," be
opened to accommodate any of the characters that will not fit into the
existing "Kana Supplement" block.

The UTC would also like to point out that directly encoding these characters
completely solves the issue of using U+3099 or U+309A to support dakuten- or
handakuten-annotated versions of any of them.

The "phonetic value" (Hiragana) and "mother ideograph" (CJK Unified Ideograph)
information for these characters, which the UTC agrees is useful, can be
provided in a separate data file or as character annotations. Such a data file
can also be used to establish equivalences for the purposes of searching and
sorting via the "phonetic value" (Hiragana), which is already being done for
other scripts.

Finally, the UTC recommends character names following the pattern shown below:

U+1B002 HIRAGANA LETTER ARCHAIC A VARIANT 1
U+1B003 HIRAGANA LETTER ARCHAIC A VARIANT 2
U+1B004 HIRAGANA LETTER ARCHAIC A VARIANT 3
U+1B005 HIRAGANA LETTER ARCHAIC A VARIANT 4

* If MJ090014 is determined to be a duplicate of U+1B001 per Garth Wallace's 
2015-07-27 feedback in L2/15-189: http://www.unicode.org/L2/L2015/15189-pubrev.html

** Two characters, U+1B000 and U+1B001, are already encoded in this block, 
leaving 254 code points, U+1B002 through U+1B0FF, available.

Feedback on UTRs / UAXes

Date/Time: Mon Aug 31 14:42:18 CDT 2015
Name: Ken Lunde
Report Type: Other Question, Problem, or Feedback
Opt Subject: UAX #45 datafile suggestion & changes


The following five duplicate character pairs in UAX #45 are known to me, 
but there is no indication that they are duplicates:

UTC-00309 & UTC-00864
UTC-00338 & UTC-01029
UTC-00327 & UTC-01126
UTC-00061 & UTC-01151
UTC-00448 & UTC-01157

In order to deprecate one of the characters in each pair, and to make the 
relationship to the non-deprecated character explicit, I suggest that Field 1 
of the deprecated characters in the UAX #45 datafile be changed to the U-Source 
reference of the corresponding non-deprecated character. This would also apply 
to any additional duplicate character pairs that are identified in UAX #45 
at a later date.

The following are the current UAX #45 datafile entries for these five pairs:

UTC-00309;U;U+2B782;72.5;0494.181;⿰日玉;LDS 52
UTC-00864;D;U+2B782;72.5;0494.181;⿰日玉;Adobe-Japan1 20141

UTC-00338;F;;86.1;0665.031;⿰火乚;kLau 1272b
UTC-01029;F;;86.1;0665.031;⿰火乚;UTCDoc L2-12/333 61

UTC-00327;F;;178.8;1394.081;⿰韦华;kXHC1983 1197.060
UTC-01126;F;;178.6;1394.081;⿰韦华;UTCDoc L2-12/333 158

UTC-00061;F;U+9DC3;196.10;1505.141;⿰晏鸟;ABC2
UTC-01151;F;;196.10;1498.201;⿰晏鸟;UTCDoc L2-12/333 183

UTC-00448;F;;66.8;0473.171;⿰育攵;kFennIndex 23.07
UTC-01157;F;;66.8;0473.171;⿰育攵;UTCDoc L2-12/333 189

The following would be the same fields with the suggested change applied:

UTC-00309;UTC-00864;U+2B782;72.5;0494.181;⿰日玉;LDS 52
UTC-00864;D;U+2B782;72.5;0494.181;⿰日玉;Adobe-Japan1 20141

UTC-00338;UTC-01029;;86.1;0665.031;⿰火乚;kLau 1272b
UTC-01029;F;;86.1;0665.031;⿰火乚;UTCDoc L2-12/333 61

UTC-00327;UTC-01126;;178.8;1394.081;⿰韦华;kXHC1983 1197.060
UTC-01126;F;;178.6;1394.081;⿰韦华;UTCDoc L2-12/333 158

UTC-00061;UTC-01151;U+9DC3;196.10;1505.141;⿰晏鸟;ABC2
UTC-01151;F;;196.10;1498.201;⿰晏鸟;UTCDoc L2-12/333 183

UTC-00448;UTC-01157;;66.8;0473.171;⿰育攵;kFennIndex 23.07
UTC-01157;F;;66.8;0473.171;⿰育攵;UTCDoc L2-12/333 189

The Field 1 description in the datafile header should include the following 
additional descriptions:

UTC-xxxxx or UCI-xxxxx: Non-deprecated version of this deprecated character
UNC-2015: Included in the UTC's 2015 "Urgently Needed Character" proposal

Related to the second additional Field 1 description above, the entry for
 UTC-01201 should be changed to the following:

UTC-01201;UNC-2015;;112.5;0829.331;⿰⽯示;UTCDoc L2/15-114

Lastly, the IDS in Field 5 of UTC-01046 should be changed as follows 
(the full-width question mark should be changed to U+2BA51):

UTC-01046;F;;120.5;0921.291;⿰纟𫩑;UTCDoc L2-12/333 78

That is all.


Error Reports

Date/Time: Thu Jul 30 09:48:33 CDT 2015
Name: Simon Sapin
Report Type: Error Report
Opt Subject: xn-- prefix never added in UTS #46

In http://www.unicode.org/reports/tr46/tr46-15.html#ProcessingStepConvertValidate , 
the algorithm looks for a xn-- prefix and decodes the rest of the label per 
Punycode when it is present.

In http://www.unicode.org/reports/tr46/tr46-15.html#ToASCII however, 
the xn-- prefix is never added:

> > Convert each label with non-ASCII characters into Punycode [RFC3492]. 
This may record an error.
This should probably be replaced with something like:

> > For each label with non-ASCII characters, replace the label with “xn--” 
followed by the encoding of the label according to Punycode [RFC3492]. This may 
record an error.

Date/Time: Fri Sep 4 14:11:11 CDT 2015
Name: Ken Lunde
Report Type: Error Report
Opt Subject: kTotalStrokes & kRSUnicode for U+2386E

Selena Wei didn't attend IRG #44 last month, but Bear Tseng was there
representing TCA. Dirk Meyer (Adobe), who attended IRG #44 as my proxy and as
the sole US/Unicode representative, asked him about the residual stroke count
issue of U+2386E, and Bear confirmed that it should be changed from 22 to 21.
This also means a change for this character's kTotalStrokes value. This
character has a single source, a T-source, and the CNS 11643 website confirms
that this change should be made:

http://www.cns11643.gov.tw/MAIDB/query_general_view.do?page=f&code=6c2d

The following are the current stroke-related values for this character:

U+2386E  kTotalStrokes  26
U+2386E  kRSUnicode  75.22

The suggested change is as follows:

U+2386E  kTotalStrokes  25
U+2386E  kRSUnicode  75.21

That is all.

Date/Time: Wed Sep 16 18:37:29 CDT 2015
Name: Pedro Navarro
Report Type: Error Report
Opt Subject: Possible UAX #9 correction

In UAX #9, section 2.7 (http://unicode.org/reports/tr9/#Markup_And_Formatting)
there is a table that maps Unicode bidi formatting characters to their HTML 5
equivalent. Since the publication of that document, CSS Writing Modes Level 3
and HTML5 have moved forward and switch to the use of isolates whenever a tag
has a dir attribute (see the Note in Section 2.1.3 of the Additional
Requirements for Bidi in HTML & CSS document
(http://www.w3.org/TR/2015/NOTE-html-bidi-20150721/#bidi-isolation-solution).

In the last two rows of that table, where it says:

RLE	dir = "rtl"	attribute on block or inline element
LRE	dir = "ltr"	attribute on block or inline element

It should say:

RLI	dir = "rtl"	attribute on block or inline element
LRI	dir = "ltr"	attribute on block or inline element

The semantics for <bdo> have also changed to isolate-override so, according to
http://www.w3.org/TR/css-writing-modes-3/#unicode-bidi: Following clause HL3
[UAX9], for inline boxes this corresponds to inserting an FSI (U+2068)
followed by an LRO (U+202D), for direction: ltr, or RLO (U+202E), for
direction: rtl at the start of the box, and a PDF (U+202C) followed by PDI
(U+2069) at the end of the box. [UAX9] For other boxes, this value is exactly
equivalent to bidi-override.

Pedro Navarro

Date/Time: Wed Sep 23 10:04:38 CDT 2015
Name: Loïc Etienne
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: gc of MONGOLIAN TODO SOFT HYPHEN

Dear Unicode Consortium,

While adapting stringprep (RFC 3454) for Unicode 8.0.0, I noticed that
    RFC 3454 B.1 Commonly mapped to nothing
is a subset of
    Unicode 8.0.0 Default_Ignorable_Code_Point
except for
    1806 MONGOLIAN TODO SOFT HYPHEN;Pd
which is not default-ignorable.

RFC 3454 itself is obsoleted by RFC 7564, which relies upon default-ignorable code points.

For this reason, I suggest to consider the possibility to change the general category 
of 1806 MONGOLIAN TODO SOFT HYPHEN from Pd to Cf.

Best regards,
Loïc Etienne

Date/Time: Fri Oct 2 12:53:10 CDT 2015
Name: Phil Armstrong
Report Type: Error Report
Opt Subject: Emoji should be East Asian Full Width?

I note (from playing around with wcswidth()) that emoji characters are
single/half width.

Since emoji originate from Japanese text display formats, shouldn’t they be
marked as East-Asian Full Width so that they will take up two columns on
monospace terminal displays? As it stands, the glyphs overlap adjacent
characters on monospace terminals, because they are not marked as being full-
width, despite clearly being designed to take up more horizontal space than a
monospace character.

cheers, Phil

Other Reports

Date/Time: Thu Oct 22 21:30:48 CDT 2015
Name: Ken Lunde
Report Type: Error Report
Opt Subject: U+1F596 aliases/annotations

With regard to U+1F596 RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS,
how about adding the following two aliases or annotations?

Vulcan hand gesture
Live Long And Prosper

Having both will help users find this character more easily.