L2/16-006

Comments on Public Review Issues
(Oct 31, 2015 - Jan 22, 2016)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 31, 2015, since the previous cumulative document was issued prior to UTC #145 (October 2015). Grayed-out items in the Table of Contents do not have feedback here.

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of January 22, 2016. Gray rows have no feedback to date.

Issue Name Feedback Link
316 Proposal to Remove Some Hira/Kata From Script_Extensions (feedback)
315 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback)
314 Proposed Update UAX #45, U-Source Ideographs (feedback)
313 Proposed Update UTS #39, Unicode Security Mechanisms (feedback)
312 Feedback on Draft additional repertoire for ISO/IEC 10646:2016 (5th edition) CD2 (feedback)
311 Proposed Update UTS #10, Unicode Collation Algorithm (feedback)
310 New Character Property for Prepended Concatenation Marks (feedback)
308 Property Change for U+202F NARROW NO-BREAK SPACE (NNBSP) (feedback) no new
307 Proposed Update UAX #38, Unicode Character Database (feedback
306 Proposed Update UAX #29, Unicode Text Segmentation (feedback) no new
305 Proposed Update UAX #44, Unicode Character Database (feedback
304 Proposed Update UAX #24, Unicode Script Property (feedback
303 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax (feedback) no new

The links below go to locations in this document for feedback.

Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports

 


Feedback to UTC / Encoding Proposals

Date/Time: Mon Dec 14 18:03:43 CST 2015
Name: Raph Levien
Report Type: Error Report
Opt Subject: Not all emoji ZWJ sequences supported on OSX 10.11

The emoji sequences that include U+2764 but no U+FE0F variation selector do
not render correctly in Mac OS X 10.11 (El Capitan). Repro steps:
download http://www.unicode.org/Public/emoji/2.0//emoji-zwj-sequences.txt,
open in TextView. The top set all render correctly (a single compound emoji
representing the sequence). The bottom set (which all have U+2764 but not a
following U+FE0F) split into individual emoji.

I recommend that the emoji-zwj-sequences.txt data file indicates that the
bottom set is not reliably rendered on all platforms, even those that in
general aggressively implement Unicode 8 emoji and zwj sequences.

Date/Time: Thu Jan 21 12:15:00 CST 2016
Name: Doug Ewell
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/16-008, "Unicode-Specified Emoji Customizations"

With regard to the choice of U+E007E TAG TILDE as a terminator of emoji tag
sequences, L2/16-008 states, "NOTE: if we un-deprecated U+E007F CANCEL TAG in
Unicode v9.0, we could use that for the terminator, which would be slightly
more natural."

The current working draft on Google Docs strengthens this to "... which would
be a more natural choice."

U+E007F CANCEL TAG was originally intended to mark the end of a language-
tagged block of text. As such, the usage suggested in L2/16-008 to mark the
end of a tag sequence is very similar, although not identical. The character
is already encoded and un-deprecating it would be a comparatively inexpensive
operation for UTC, and like the earlier un-deprecation of U+E0020 through
U+E007E, it would not imply any manner of support for the older language-
tagging concept.

The current choice of TAG TILDE is arbitrary and could potentially be a source
of confusion, given that the CLDR validity files for region and subdivision
(intended to be used in validating flag tag sequences) use an ASCII tilde for
a completely different purpose, to indicate ranges.

I support removing the deprecated status of U+E007F CANCEL TAG and assigning
it as the terminator character described in L2/16-008.

As a side note, there are discrepancies in the terminology used in L2/16-008
to define tag sequences. The chart of "special terms" includes 'tag-term' and
'tag-nterm', but in the following ABNF and subsequent examples, these are
changed to 'Tag-STOP' and 'tag-nt' respectively. Disregarding the differences
in capitalization, the actual labels need to be made consistent.

Date/Time: Sun Jan 24 10:13:51 CST 2016
Name: A.R.Amaithi Anantham
Report Type: Error Report
Opt Subject: L2/15-256 and L2/16-030

Sir,

It is proposed to use Tamil Nutka to represent Sonants, in Tribal languages
(Vide Unicode Document Number L2/15-256 and L2/16-030). In Tamil Script,
Diacritics are not to be allowed. Therefore Diacritics are not to be used. The
only way is to make use of concerned code points, in Tamil Block, which are
not made use of so far, for Sonants. Therefore I am proposing the above
Sonants, at the code points, as noted below, in Tamil Block:

(1) Code point 0B87 for ௯ (G), 
(2) Code Point 0BA1 for ௰ (DD),
(3) Code Point 0BA6 for ௲ (D),
(4) Code Point 0BAC for ௱ (B).

The proposals of either Tamil Nutka or Diacritics are not, at all, needed.

With Regards

A.R.Amaithi Anantham

Date/Time: Mon Jan 25 23:55:42 CST 2016
Name: Agustin Fonts
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/16-022 Condom Emoji Submission

We understand that Unicode would be considering safe sex as a part of emoji
communications. However, we believe that limiting safe sex emojis to the
condom is too restrictive. There are many other ways to practice safe sex for
both men and women. Limiting such emojis to a condom emoji may indicate to
users that safe sex is the sole responsibility of men and/or fully ensured by
the condom. Giving users such an impression is not safe or inclusive.

We would like to strongly recommend that Unicode not restrict the expression
of safe sex to the condom, which would be just a marketing platform for condom
manufacturers, but rather to create a safe sex category designed to promote
safe sex for all genders.

Feedback on UTRs / UAXes

Date/Time: Tue Jan 12 15:34:25 CST 2016
Name: Andy Heninger
Report Type: Error Report
Opt Subject: UAX 14 break rules for numbers

 The following originated as an ICU bug report from Bernhard Fey, 
but the problem actually stems from the UAX 14
line break rules.

http://bugs.icu-project.org/trac/ticket/12017

The break positions found in the text "start .789 end" are not so good.

With the default UAX rules the breaks would be
|start .789 |end|
(LB 13 prevents a break before the '.'; LB 25 prevents after.)

With the suggested regular expression tailoring for numbers, used by ICU, they are
|start .|789 |end|

The correct breaking would be
|start |.789 |end|

How best to fix the problem will take some thought. 

Other Reports

Date/Time: Sun Jan 10 16:08:47 CST 2016
Name: Shai Berger
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: FAQ about the UBA and Higher Level Protocols

Dear Unicode editorial committee,

Here is a Q&A pair for your consideration:

Q: When can a Higher-Level Protocol be used to override the default rules of the UBA?

A: Higher-Level Protocols apply in specialized contexts such as marked-up
text, specific fields in forms, or specific fields in messages complying with
pre-set formats. Generally, you can say some Higher-Level Protocol applies to
a piece of text if all users of that piece in that context agree on the rules
and semantics dictated by that protocol. As soon as some text's interpretation
is governed by a Higher-Level Protocol, that text is no longer plain text. In
particular, a program is not a protocol -- if a program claims to be a plain-
text viewer, but presents all paragraphs with base direction LTR, it is not
compliant with the Unicode standard.

Explanation and rationale:

The Unicode Bidi Algorithm, as specified in
http://www.unicode.org/reports/tr9/, specifies a default algorithm for setting
the base direction of a paragraph, but allows Higher Level Protocols to
override this (http://www.unicode.org/reports/tr9/#HL1). This has been
interpreted by some software developers as permission to pick the base
direction using their own rules when dealing with plain text, claiming,
essentially, that their program is a higher-level protocol. Probably the most
common example of such a program is Microsoft Outlook, which (for sure in
versions up to and including Outlook 2010, but AFAIK to this day) allows its
user to specify what base direction to give to all plain-text messages it
reads or writes; this direction can be "auto", but if it is "RTL" or "LTR",
UBA rules P2 and P3 are ignored. As you may imagine, this creates
interoperability problems, to the point that many Hebrew users feel that
plain-text is not an appropriate format for writing Hebrew mails.

My own view is that you cannot apply Higher-Level Protocols to plain text and
still call it plain text; I think this follows from the dictionary definitions
of the word "protocol" and the term "plain text". I also think plain text is
required to "forbid" higher-level protocols by the emphasized remark on page
19 of the Unicode standard: "Plain text must contain enough information to
permit the text to be rendered legibly, and nothing more."

As a Free Software enthusiast, I spent years thinking this was just another
example of Microsoft's disrespect for standards, but recently I've encountered
free-software developers, members of the relevant Israeli standards committee,
who espouse the idea that a program can be a higher-level protocol; that
according to the Unicode standard, bidirectional plain-text is, in general,
not enough to determine the correct presentation.

So, I mentally apologize to Microsoft for ascribing them either malice or
incompetence in this matter; but I'd like to have the issue resolved. I am
suggesting that my understanding be published as a FAQ, assuming that, indeed,
this is what the designers of the standard intended. If I am wrong, a
clarification going the other way would be very welcome as well.

Thanks in advance,
Shai.