L2/16-123

Comments on Public Review Issues
(Jan 22, 2016 - May 12, 2016)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of May 5, 2016, since the previous cumulative document was issued prior to UTC #146 (February 2016). Grayed-out items in the Table of Contents do not have feedback here.

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of January 22, 2016. Gray rows have no feedback to date.

Issue Name Feedback Link
325 Proposed Update UTS #18, Unicode Regular Expressions (feedback)
324 Proposed Update UAX #42, Unicode Character Database in XML (feedback)
323 Unicode 9.0.0 Beta (feedback)
322 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback)
321 Proposed Draft UTS #52, Unicode Emoji Mechanisms (feedback)
320 Proposed Update UAX #41, Common References for Unicode Standard Annexes (feedback)
319 Proposed Update UTR #51, Unicode Emoji (feedback)
318 Proposed Update UAX #11, East Asian Width (feedback)
317 Proposed Update UTS #46, Unicode IDNA Compatibility Processing (feedback)
316 Proposal to Remove Some Hira/Kata From Script_Extensions (feedback)
315 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback) no new
314 Proposed Update UAX #45, U-Source Ideographs (feedback)
313 Proposed Update UTS #39, Unicode Security Mechanisms (feedback)
311 Proposed Update UTS #10, Unicode Collation Algorithm (feedback)
307 Proposed Update UAX #38, Unicode Character Database (feedback
306 Proposed Update UAX #29, Unicode Text Segmentation (feedback)
305 Proposed Update UAX #44, Unicode Character Database (feedback
304 Proposed Update UAX #24, Unicode Script Property (feedback
303 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax (feedback) no new

The links below go to locations in this document for feedback.

Feedback to UTC / Encoding Proposals
Error Reports

 


Feedback to UTC / Encoding Proposals

Date/Time: Thu Mar 10 16:51:39 CST 2016
Name: David Corbett
Report Type: Feedback on an Encoding Proposal (L2/15-083)
Opt Subject: U+2E4B TRIPLE DAGGER

L2/15-083 (“Proposal for addition of Group Mark symbol”), pp. 7–8, unified the
record mark with U+2021 DOUBLE DAGGER. Analogously, the triple dagger can be
unified with U+2BD2 GROUP MARK, so it does not to be encoded separately at
U+2E4B.

Date/Time: Thu Mar 31 19:27:34 CDT 2016
Name: Dian Tresna Nugraha
Report Type: Error Report
Opt Subject: Glyph correction needed for Sundanese Letter JA (1B8F)

NOTE: This is on the Editorial Committee errata list.

Dear Unicode,

we've spotted mistake in the Glyph shown for Sundanese Letter JA (1B8F) 
in this document:
​http://unicode.org/charts/PDF/U1B80.pdf 

The mistake is at the top part of the glyph. Currently it is displayed as Z shaped

In fact, the top part should be similar to the Sundanese Letter DA (1B93).

The font which implements this correction can be found in:
​http://www.kairaga.com/2015/05/05/font-aksara-sunda-unicode-versi-2013-revisi.html 

Sincerely,

Dian Tresna Nugraha

Date/Time: Mon May 2 11:34:37 CDT 2016
Name: Nozomu Katō
Report Type: Feedback on an Encoding Proposal (L2/16-085)
Opt Subject: Two comments on hentaigana proposal

Two comments on hentaigana proposal

--
1. Re L2/16-085 Status of hentaigana proposal

I, the original proposer of U+1B001, think that Archaic YE (U+1B001) and
Hentaigana E-1 are of the same character. In my understanding, since the E/YE
distinction was lost and forgot, kana letters that trace back to E and ones
that trace back to YE came to be regarded as interchangeable; thus, U+3048
(え), U+1B001 (𛀁), nos. 14-19 in L2/15-343, etc. became variants for the
phoneme which E and YE had merged into. (Incidentally, regarding the other
kana letters labeled as E in L2/15-343, nos. 17 and 18 trace back to E, as
they came from the same ideograph as U+3048. Nos. 14, 16 and 19 trace back to
YE. No. 15 is likely to have been YE, judging from the sound of its ideograph
in Middle Chinese.)

As for changing the representative glyph for U+1B001, the current glyph for
Hentaigana E-1 looks to me too cursive. I would like it to be modernized some
more, rather than to use the current one for U+1B001.

--
2. Phonetic values used in hentaigana proposal

According to L2/15-239 etc., each hentaigana character is identified by the
pair of its phonetic value and its mother ideograph. However, apparently, the
phonetic values used in the current charts for hentaigana proposal are not of
modern Japanese. Regarding the charts in L2/15-343, the modern phonetic value
of nos. 269-273 is not WI but I as well as nos. 5-8, the one of 274-277 is not
WE but E as well as 14-19, and the one of 278-284 is not WO but O as well
20-22.

I guess the classification system of syllables used in the current proposal is
tacitly based on the Iroha poem and these I/WI, E/WE and O/WO distinctions
came from it. In any case, it might be required to explain why this
classification has been chosen for hentaigana encoding instead of one based on
Modern Japanese.

Date/Time: Mon May 2 16:53:16 CDT 2016
Name: John Cowan
Report Type: Feedback on an Encoding Proposal (L2/16-063)
Opt Subject: 16063-pancjkv-ivd-collection

I propose that 16063-pancjkv-ivd-collection be amended to avoid the anomalous
VS256 by moving everything else down one VS.

Date/Time: Tue May 3 10:28:43 CDT 2016
Name: John Cowan
Report Type: Feedback on an Encoding Proposal (L2/16-088)
Opt Subject: 16088-chars-for-emoji-provisional.pdf

I am opposed to Provisional status.  The random decisions of random
implementors shouldn't be put into the Unicode Standard.  Instead, we should
simply embrace the fact that *any* symbol character may eventually be given an
emoji presentation, and selectively promote such characters from Emoji=No
(which should be understood as mutable) to Emoji=Yes (which should be
understood as immutable).

Date/Time: Tue May 3 10:39:04 CDT 2016
Name: John Cowan
Report Type: Feedback on an Encoding Proposal (L2/16-073)
Opt Subject: 16073-lampung.pdf

I think that CONSONANT SIGN in the character names should be changed to FINAL,
for clarity and uniformity with other scripts.

Date/Time: Tue May 3 11:34:27 CDT 2016
Name: William Overington
Report Type: Feedback on an Encoding Proposal (L2/16-103)
Opt Subject: Feedback on Jurassic Emoji proposal

I write to comment on the following document.

http://www.unicode.org/L2/L2016/16103-jurassic-fdbk.pdf 

I agree that it is better to have many more emoji for dinosaurs and to have
whole dinosaurs not just heads.

However, I opine that plesiosaur and pliosaur should be two separate emoji
characters.

The neck lengths are very different.

I realize that the distinction between the two types is not that simple, yet
encoding two different characters now rather than just the one would be
relatively easy.

If the two types are encoded as if the same item, then once encoded, if people
want to express the two different types of dinosaur then there would be big
problems over making changes, if indeed such changes were ever possible.

So could the Unicode Technical Committee please encode more dinosaur emoji
than Andrew West has listed, in particular both a plesiosaur with a long neck
and a pliosaur with a short neck.

It seems to me that encoding a block of, say, thirty-two dinosaur emoji now,
or more if needed, and doing a really thorough job, consulting expert
palaeontologists, would be the best solution for the future.

I am against dumbing-down, of not having precision lest some people make
accusations of pedantry.

So, please use one code point for each type of dinosaur that is recognised by
science and not group different types of dinosaur together as one character.

Dinosaurs are very popular and I ask that the Unicode Technical Committee
follow a sound educational approach.

William Overington

Tuesday 3 May 2016

Date/Time: Tue May 3 13:32:37 CDT 2016
Name: Peter Constable
Report Type: Feedback on an Encoding Proposal (L2/16-060)
Opt Subject: Comment on L2/16-060 Flag Tofus


In L2/16-060, a proposal is made that UTR#51 should recommend that
implementations display all two-character sequences of regional indicator
symbols (RISs) as a "flag tofu" in case the implementation does not have a
specific flag presentation for a given two-character sequence.

The rationale given is based on scenarios involving contiguous sequences of
_more than_ two RISs with the a given pair not recognized, resulting in the
second character of a pair sequence being combined with the first character of
a following pair sequence.

The root issue behind this rationale is the lack of any syntactic delimitation
for pairs of RISs. This has been raised in the past. The original RIS proposal
did not allow for this scenario because, at that time, the only requirement
anticipated was a need for round-trip interchange of Japanese carrier data. In
particular, reliable display of sequences of flags was not anticipated since
the general mood among implementers involved in review of the RIS proposal was
not to display flags at all. Clearly, that has changed in the years since RISs
were first proposed; perhaps the need for delimitation should be revisited.

Having said that, it is possible in principle for an OpenType Layout font
implementation to avoid the problem of mis-matched ligature substitution given
in the rationale: A font developer can assign 26 glyph IDs to represent the
_second_ RIS in a pair, and then a substitution lookup table could be
processed over a string to substitute each second RIS. So a sequence of
default glyph IDs for RISs would change as follows:

<gRIS1><gRIS2><gRIS3>...  --> <gRIS1><gRIS2.alt><gRIS3>...

Then the ligature substitutions that result in specific flags would be
expressed in terms of RIS pairs in which the second glyph has been substituted
as above. In this case, a glyph sequence <gRIS2.alt><gRIS3> would
never get matched for a ligature substitution.

In this way, the problem described in the rationale can be avoided without the
implementation needing to display the unsupported RIS pair
<RIS1><RIS2> as a "flag tofu".

Date/Time: Tue May 3 15:54:42 CDT 2016
Name: David Corbett
Report Type: Feedback on an Encoding Proposal (L2/16-110)
Opt Subject: Feedback on Ogham variation sequences

L2/16-110 (Proposal to define 21 variation sequences for Ogham letters) says
that hammerhead A and S-shaped A might have distinct phonetic values from the
standard ailm, that rabbit-eared D contrasts with standard dair in one
inscription, and that it is not clear that semicircular U is an uilleann.
Should they therefore get their own code points? How do other Oghamists
analyze these variants? What is the plan if these variation sequences are
encoded, but later research reveals that they are distinct characters after
all?

Date/Time: Tue May 3 23:28:20 CDT 2016
Name: John Cowan
Report Type: Feedback on an Encoding Proposal (L2/16-108)
Opt Subject: 16108-n4719-go-game-encoding.pdf

I suggest that within solution D to the problem of encoding enclosed numbers,
the use of the ZWJ character be replaced by U+034F COMBINING GRAPHEME JOINER.
In this way, a sequence like 1 CGJ 2 CGJ 3 U+20DD, although technically three
default grapheme clusters, can be understood as a single 123 enclosed by a
circle, without implying that 1, 2, and 3 are ligatured together in any way,
as ZWJ would do.

Date/Time: Wed May 4 08:41:00 CDT 2016
Name: David Corbett
Report Type: Feedback on an Encoding Proposal (L2/16-107)
Opt Subject: Feedback on erhua variation sequences

L2/16-107 (Proposal to define Standardized Variation Sequences for two Chinese
ideographs) says that the small versions of the ideographs are used
contrastively from the normal versions. This sounds similar to the use of
superscript letters for secondary articulation in Latin-script phonetic
alphabets, which are encoded distinctly, so why is it not “appropriate or
helpful to end-users to encode separate small versions of these two
characters”?

Date/Time: Wed May 4 10:25:19 CDT 2016
Name: John Cowan
Report Type: Feedback on an Encoding Proposal (L2/16-071)
Opt Subject: 16071-three-fingers.pdf

I object to the overly generic name THREE FINGERS SIGN.  There are various
different signs that involve holding up three fingers, but they are different
fingers.  The well-known ASL sign for "I love you" involves holding up fingers
1 (thumb), 2, and 5; W in the American and French manual alphabets involves
holding up 3, 4, and 5; the Boy/Girl Scouts hold up fingers 2, 3, and 4.  A
name like HAND WITH FINGERS-123 would I think be better.

Date/Time: Wed May 4 11:49:11 CDT 2016
Name: David Corbett
Report Type: Feedback on an Encoding Proposal (L2/16-101)
Opt Subject: Feedback on Medefaidrin

Figure 1 of L2/16-101 (Proposal for encoding the Medefaidrin (Oberi Okaime)
script in the SMP of the UCS) contains 2 more letters than are proposed for
encoding: “ɛk” and “k” (left column, 5th row, 1st and 6th letters). They look
similar to “h” and “si”, which the proposal calls HEG and SII. Are they
distinct letters?

Date/Time: Fri May 6 07:33:00 CDT 2016
Name: William Overington
Report Type: Feedback on an Encoding Proposal (L2/16-105)
Opt Subject: Feedback on Coded Hashes of Arbitrary Images proposal

I write to comment on the following document.

http://www.unicode.org/L2/L2016/16105-unicode-image-hash.pdf 

I support the proposal.

I write to make two suggestions please.

1. In section 2.5 of the pdf document, at the end of the section is the following.

> > Depending on the protocol, as the input arrives, the receiver may have
> > some ambiguity about when the sequence of CHAI characters ends. A receiver
> > may choose to wait until the next non-combining character (signaling the
> > end of the combining character sequence), or a protocol-defined end-of-
> > message signal, before retrieving the emoji description.

I opine that it would be better to encode

U+EFFFB IMAGE HASH CODE SUBSET COMPLETED

and to use that character at the end of a sequence of IMAGE HASH characters so
that when that character is reached, a definite indication that the hash code
subset has been completed is received from within the Unicode plain text
message.

2. In section 2.5 of the pdf document there is the following.

> > Otherwise, the receiver displays the base character while it attempts to
> > retrieve an emoji description whose hash matches the encoded hash prefix.

As the method is so that arbitrary images can be referenced from a plain text
sequence, it seems that no existing base character may be suitable for every
arbitrary image.

So, could a

BASE CHARACTER FOR AN ARBITRARY IMAGE

be encoded please,

either as U+EFFFA or, as it is a displayed character, with a code point in plane 1.

It is possible that U+1F5BC FRAME WITH PICTURE could be used, yet I opine that
a specific BASE CHARACTER FOR AN ARBITRARY IMAGE character becoming encoded
would be a better solution as that would clearly indicate that an arbitrary
image not necessarily based upon any character in regular Unicode is being
referenced.

William Overington

Friday 6 May 2016

Date/Time: Fri May 6 09:50:05 CDT 2016
Name: David Corbett
Report Type: Feedback on an Encoding Proposal (L2/16-125)
Opt Subject: Feedback on medieval punctuation

The names list should clarify that TILDE WITH DOT ABOVE AND DOT BELOW is
really tilde/dash with dot/comma above and dot/comma below.

The triple dagger should be unified with GROUP MARK as DOUBLE DAGGER is with
the record mark. (Or both pairs should be disunified. Just be consistent.)

SIGNE DE RENVOI should not be accepted. According to Parkes 1993 quoted on
p.17 of L2/16-125, a signe de renvoi is “Any sign used to associate matter in
the text with material added in the margin”, not specifically this three-dot
sign. See https://globalcurrents.stanford.edu/visual-element/signes-de-renvoi 
for other signes de renvoi. It does not make sense to encode something called
“SIGNE DE RENVOI” without considering the full spectrum of signes.

If the proposed signe de renvoi cannot be unified with U+10B3C LARGE TWO DOTS
OVER ONE DOT PUNCTUATION (section 4.1 of L2/07-004 says it can’t), then it
should be encoded with a generic name like TWO DOTS OVER ONE DOT MARK.

Date/Time: Tue May 10 16:46:48 CDT 2016
Name: Markus Scherer
Report Type: Error Report
Opt Subject: Provisional value for Emoji property (L2/16-087)

http://www.unicode.org/L2/L2016/16087-provisional-value-for-emoji.pdf 

This proposal constitutes changing several properties from Binary to 
Enumerated and therefore needs to be settled before implementation 
practice is too ingrained.

If this is necessary, then I suggest "Maybe" as the new value rather 
than "Provisional".

The document includes a question about constraints on property value 
assignments. I suggest that it is premature to enforce stability on 
"Emoji" and related property value assignments.

Date: May 12, 2016
Source:
Dominik Schwarz
Report Type: Feedback on an encoding proposal (L2/16-072)

Dear Unicode People,

based on Ken's Message via Twitter
(https://twitter.com/ken_lunde/status/730885881448386560) I'd like to give
Feedback on L2/16-072 as well as on the dinosaur emoji topic in general.

a) It seems (https://twitter.com/ken_lunde/status/730882287445774336) there
are at least 2 different proposals for dinosaur Emojis: L2/16-072 and my own
(can't find it here: http://www.unicode.org/L2/L2016/)

b) I highly appreciate the idea of having more than one type of dinosaur as
emojis. The selection made in L2/16-072 is a reasonable selection I support.

c) However, I am concered that the provided mocks in L2/16-072 are not
sufficent: Showing only the face of a dinosaur misses a very important point:
the iconic shape of dinosaurs can't be compared to any other living or dead
animal. It's a unique body form even small children can draw.

Therefore - unrelated to the number of dinosaur emojis - I recommend to show
the full bodies of the dinosaurs and not only faces. The draft attached to my
proposal shows, that this even works with the limited space that is available.

Error Reports

Date/Time: Fri Feb 5 18:31:37 CST 2016
Name: Harkeerat Toor
Report Type: Error Report
Opt Subject: Error with rendering ੴ (U+0A74)

There is an inconsistency with this script, whereas it sometimes renders
incorrectly. This code is a part of South Asian Scripts: Gurmukhi.

The current 8.0 chart with incorrect rendering looks like this:
http://unicode.org/charts/PDF/U0A00.pdf 

The correct version looks like this:
http://www.fileformat.info/info/unicode/font/anmoluni/u0A74.png 

The incorrect version looks like this:
http://www.fileformat.info/info/unicode/font/arial_unicode_ms/u0A74.png 

Date/Time: Mon Feb 22 16:05:13 CST 2016
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Problem with L2/16-011R3 breaking of emoji variation sequences

The rules suggested for modifier sequences in L2/16-011R3 (that the UTC
approved in its last meeting) don't seem to consider sequences such as <emoji,
VS, emoji modifier>. This causes grapheme (and other) breaks incorrect assumed
after the VS in such sequences.

Date/Time: Thu Mar 10 19:02:08 CST 2016
Name: Ray Larabie
Report Type: Error Report
Opt Subject: UCAS Cree errors

Hello Unicode folks,

I recently created the official typeface for Canada's sesquecentennial which
includes UCAS.

http://www.wired.com/2015/12/canadas-new-typeface-unifies-the-countrys-many-languages/ 

http://canada.pch.gc.ca/eng/1445028439342 

When the Canada 150 launched, it got some press attention. Someone familiar
with the Cree language let me know, on Twitter that there was a problem with
the "sh" series. From 1510-1525 is a range of "sh" characters that are only
used in the Cree language. Like many UCAS characters, it pretty much involves
a shape that's flipped and rotated, with added dots. The vertical characters
that look like a "sine wave" are fine. The problem is with the rotated form.
The sine wave shape should be rotated until is looks like it's lying flat. It
should look a little bit like a tilde. But in the current Unicode chart, it
looks more like an upright S. I've been told that fonts using this shape are
unusable. Since a lot of OS fonts are using this incorrect form, it's causing
quite a problem for Cree people. Here's a sample image showing my recommended
changes. https://dl.dropboxusercontent.com/u/19433025/cree-changes.png 

Kevin Brousseau, the person who contacted me on Twitter created this PDF
explaining the problem: https://dl.dropboxusercontent.com/u/19433025/sh-
series.pdf

If you click the "view All" button, you can see the inconsistency.
http://www.fileformat.info/info/unicode/char/1515/fontsupport.htm 

I'm certainly no Cree language expert but Kevin certainly is. He told me he'd
be happy to be contacted about this problem. brousseau_kevin_at_yahoo.ca

Font trivia: I'm the guy who made most of the proposal samples for Apple's emoji submissions.

All the best,

Ray Larabie
Typodermic Fonts Inc.