L2/19-010

Comments on Public Review Issues
(Sept 14, 2018 - January 11, 2019)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 11, 2019, since the previous cumulative document was issued prior to UTC #157 (September 2018). Some items in the Table of Contents do not have feedback here.

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of January 8, 2019.

Issue Name Feedback Link
394 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback) No feedback at this time
393 Proposed Update UAX #42, Unicode Character Database in XML (feedback) No feedback at this time
392 Multi-person Emoji (feedback)
391 Proposed Update UTS #39 Unicode Security Mechanisms (feedback) No feedback at this time
390 Proposed Update UAX #29 Unicode Text Segmentation (feedback)
389 Unicode 12.0.0 Beta (feedback)
388 Proposed Update UAX #34, Unicode Named Character Sequences (feedback) No feedback at this time
387 Unicode Emoji 12.0 Beta (feedback)
386 Proposed Update UAX #31 Unicode Identifier and Pattern Syntax (feedback)
385 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback at this time
383 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback)
382 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)
381 Proposed Update UAX #45, U-source Ideographs (feedback)
380 Proposed Update UTS #51, Unicode Emoji (feedback)
379 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time

The links below go to locations in this document for feedback.

Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports

Note: The section of Feedback on Encoding Proposals this time includes:
L2/18-301,   L2/19-006 

 


Feedback to UTC / Encoding Proposals

Date/Time: Mon Oct 15 06:30:35 CDT 2018
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: Response concerning L2/18-301

I have received the UTC’s response to document L2/18-301 (“Deprecation
Inconsistencies in Code Chart Annotation”) and would like to quickly reply
to some of the points raised.

I understand the UTC’s desire to keep character properties and descriptions
as stable as possible. However, my main concern that prompted me to write
the document in question is that the Deprecated property in its current form
does not appear to serve any practical purpose for implementations because
it does not actually enumerate all the characters that are deprecated. If I
wanted to programmatically determine which characters are not recommended
for usage, relying solely on the UCD would lead me to wrong results. I would
have no other choice but to manually mantain a list of discouraged
characters by reading through the annotations in the 290+ code charts, which
– in my eyes – defeats the purpose of defining such a property in the first
place.

The Deprecated property seems to have been assigned not according to any
defined criteria or guidelines, but mostly based on historical coincidences.
Just as an example:

• The preferred representation for an Arabic letter alef with wavy hamza 
below is the sequence <U+0627, U+065F> (اٟ) and not the character U+0673 
ARABIC LETTER ALEF WITH WAVY HAMZA BELOW (ٳ). The two are not equivalent 
under any form of normalisation.

• The preferred representation for a Sharada Om is the sequence <U+1118F, 
U+11180> (𑆏𑆀) and not the character U+111C4 SHARADA OM (𑇄). The two are 
not equivalent under any form of normalisation.

U+0673 is formally deprecated, but U+111C4 is not. Following the wording of
UAX #44, it must therefore be the case that U+0673 has “serious
architectural defects” and has “been determined to cause significant
implementation problems”, whereas U+111C4 is merely “uncommon, obsolete,
disliked, or not preferred”, but I honestly cannot tell the difference
between the two cases. The serious architectural defect of both characters
is that they are precomposed codepoints that don’t actually have a
decomposition mapping. Does the lack of formal deprecation imply that
SHARADA OM does *not* cause problems for Sharada implementations because of
this? Why then does the code chart state that it is not okay to use this
character if it does not cause problems? I do not know.

It was mentioned that changing properties like these “often can create more
confusion”, but I strongly feel like the current state of affairs is far
more confusing than the alternative, i.e. formally deprecating all
characters that are, in fact, deprecated.

Date/Time: Thu Dec 6 05:42:24 CST 2018
Name: Curtis Young
Report Type: Other Question, Problem, or Feedback
Opt Subject: The Meatball emoji

Dear Committee,

I’m writing you today to show my support for a meatball emoji. I feel the 
meatball represents the Italian community and although you have other similar 
meat balls that represent other nationalities, the Italian meatball is likely 
the most common enjoyed by many of all nationalities, especially in the United States. 

Please add the meatball to the emoji family and please, add a few to the 
spaghetti emoji. If you’re afraid to add meatballs to the spaghetti emoji 
due to people who don’t eat meat, then this is another reason to add Italian 
meatballs to the emoji family, so they can be used together as needed. 

Sincere Thanks,
Curtis Young 

Date/Time: Fri Jan 11 17:53:53 CST 2019
Name: Doug Ewell
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on L2/19-006 (Koalib letter @)

• L2/04-365 did not argue against encoding the @ letter simply because
of the size of the user community. The issue has nothing to do with
encoding ancient and minority scripts, and certainly should not be taken
as "disdain for the small user community," as claimed in L2/19-006. The
issue is the discrepancy in volume between the usage of @ as a Koalib
letter for Arabic loanwords and its usage as a symbol.

• The Internet World Stats site cited in L2/04-365, which reported 812
million Internet users in 2004, reports more than 4.2 billion (a 400%
increase) in 2018. The installed base of Internet users that would be
affected by any change to the @ sign, or encoding of an uppercase
variant which could be used for visual spoofing, would be at least 5
times as great as originally estimated.

• The usage of @ in Internet applications and protocols is far greater
than ever. Twitter and Instagram, both introduced long after 2004, use @
for handles and tagging, respectively. Facebook tagging with @ has
increased greatly in the past 14 years; according to an AP article, the
number of active Facebook users grew from 1 million in 2004 to 1.11
billion in early 2013.

• Meanwhile, no additional evidence of sustained demand for the Koalib
character has been presented beyond the original 2004 proposal.

• Encoding a single new letter and annotating U+0040, or changing its
properties (options "EM3 through "EM5" in L2/19-006), would not
alleviate the confusion of having, in effect, "two at signs." Developers
of certain stylish fonts may continue to render U+0040 with a capital A
or in other ways that could confuse users. The experience with emoji
shows clearly that users will promptly and purposefully find uses for
new characters that are at odds with their intended use.

• The "enfermer@s" example, although it appears in many places on the
Web (e.g. search for it on Bing), argues for the encoding of an
additional character to support an orthographic change for languages
that have official governing bodies to make these decisions. No such
change should be considered without consulting these bodies.

• Realistically speaking, spell checkers and other text processing
tools for this Koalib letter are not going to be widely implemented and
distributed, based on the size of the user community. Again, this is not
meant to denigrate the user community based on its size or any other
factor, but to put the problem into perspective against the backdrop of
the use of @ as a symbol.

• The best course of action, in my opinion, is for the user community
to continue using U+24B6 and U+24D0. L2/19-006 admits that this is
already recommended by SIL, which made the original 2004 proposal.

Feedback on UTRs / UAXes

Date/Time: Sat Jan 12 15:46:54 CST 2019
Name: tex texin
Report Type: Error Report
Opt Subject: TR36 pep-383 error

In https://www.unicode.org/reports/tr36/#TOC-PEP-383-Approach it states 
"For example, suppose that the byte 81 is illegal in charset n. When 
converted to Unicode, PEP 383 represents this as U+D881."

This should be U+DC81.

Error Reports

Date/Time: Mon Sep 17 12:52:48 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: ASCII fallbacks in the names list

Because the names list’s character repertoire has been expanded, the notes
for the Old Hungarian letters U+10C9F, U+10CAC, U+10CAD, U+10CDF, U+10CEC,
and U+10CED should use “Ő”, “Ű”, “ő”, and “ű”.

Date/Time: Thu Sep 27 22:56:50 CDT 2018
Name: Melissa Newman
Report Type: Error Report
Opt Subject: Hebrew no vowel placeholder character

This was already sent to the editorial committee, and acknowledged to sender.

There is no Hebrew vowel placeholder character in the Hebrew page set.  If
you look at the Hebrew page, https://unicode.org/charts/PDF/U0590.pdf , each
of the vowel characters has a dotted circle above it.  Although the dotted
circle is a valid Unicode character, when used inside a Hebrew word, it is
not recognized as a valid Hebrew character.  The direction of the text gets
messed up.  Also, a box character is also needed that is also recognized as
a valid Hebrew character and when vowels are used with the box character,
the vowels are placed correctly and the text direction is properly
maintained.  Both of these characters are needed for teaching Hebrew.   Both
of these characters already exist in Unicode.  A version of them just needs
to be added to the Hebrew character page, so that software programs will
process the RTL/LTR correctly.  Adding them right after 05EA would be the
best place for them, because that is the location of the last regular Hebrew
character.  Thank you.  Because these two characters are not recognized as
valid Hebrew characters, something that should be very simple (writing a
demonstration of Hebrew verb conjugation), becomes very complicated.

Date/Time: Mon Oct 1 20:10:34 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: Psalter Pahlavi number shaping rules

The Psalter Pahlavi numbers 1 through 4 have different shaping behavior
depending on their position in the numeric word, as explained in R1 and R2
of L2/11-147. These rules should be included in the core specification, as
is done for the rules for the Syriac alaph.

Date/Time: Sat Oct 20 10:46:43 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: Wrong subheader for U+0971 DEVANAGARI SIGN HIGH SPACING DOT

The Devanagari code chart puts U+0971 DEVANAGARI SIGN HIGH SPACING DOT under
the subheader “Abbreviation sign”, but it is not an abbreviation sign.

Date/Time: Sun Oct 21 04:36:39 CDT 2018
Name: Liisa Chi
Report Type: Error Report
Opt Subject: heading error in the Ethiopic Extended

(Note: this was also sent to the editors already.)

Currently U+2DA0..U+2DDE are listed under the heading “Syllables for
Sebatbeit,” both in the code chart (U2D80.pdf) and in NamesList.txt.

However U+2DA0..U+2DBF are not for Sebatbeit, but for Bench.

This is clearly written in the original proposal n1846.  Also in the
Appendix B/E of n2747, one may see that these glyphs were used in Bench, and
not in Sebatbeit.

Other Reports

Date/Time: Thu Dec 6 11:15:30 CST 2018
Name: Ken Lunde
Report Type: Other Question, Problem, or Feedback
Opt Subject: About the known Firefox issue

Ed Note: Firefox Version 64, which was released on December 11, 2018, fixed this issue.

Just FYI, I filed a bug against Firefox with regard to the many G-Source
representative glyphs not displaying correctly in the URO code charts, which
is a known (to Unicode) issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=1512461

Below is a table that indicates the affected ranges, which I think is
complete for the cited versions:

Versions 7, 8, 9 & 10  Version 11             Version 12 Beta
U+8324 through U+83FC
U+8500 through U+85FC  U+8536 through U+85FC  U+8596 through U+85FC
U+8700 through U+87FC  U+8700 through U+87FC  U+8700 through U+87FC
U+8900 through U+89FC  U+8900 through U+89FC  U+8900 through U+89FC
U+8B00 through U+8BFC  U+8B00 through U+8BFC  U+8B00 through U+8BFC
U+8D00 through U+8DFC  U+8D00 through U+8DFC  U+8D00 through U+8DFC
U+8F00 through U+8FFC  U+8F00 through U+8FFC  U+8F00 through U+8FFC
U+9100 through U+91FC  U+9100 through U+91FC  U+9100 through U+91FC
U+9300 through U+93FC  U+9300 through U+93FC  U+9300 through U+93FC
U+9500 through U+95FC  U+9500 through U+95FC  U+9500 through U+95FC
U+9700 through U+97FC  U+9700 through U+97FC  U+9700 through U+97FC
U+9900 through U+99FC  U+9900 through U+99FC  U+9900 through U+99FC
U+9B00 through U+9BFC  U+9B00 through U+9BFC  U+9B00 through U+9BFC
U+9D00 through U+9DFC  U+9D00 through U+9DFC  U+9D00 through U+9DFC

Date/Time: Thu Dec 13 23:07:02 CST 2018
Name: Norbert Lindenberg
Report Type: Problems / Feedback about website
Opt Subject: Glossary defines "logical order" as order of keyboard input

The Unicode glossary defines "Logical order" as "The order in which text is
typed on a keyboard. For the most part, logical order corresponds to
phonetic order." It also references Section 2.2, Unicode Design Principles.

Section 2.2, Unicode Design Principles, on the other hand, defines logical
order as "The order in which Unicode text is stored in the memory
representation", and then continues "this order roughly corresponds to the
order in which text is typed in via the keyboard; it also roughly
corresponds to phonetic order."

This definitions are contradictory because in modern keyboard
implementations it is quite possible to enable keyboard input in a different
order than the desired memory representation. For Brahmic scripts, for
example, the in-memory order of the marks within a cluster that Unicode
normalization or OpenType shaping engines expect is often not obvious to
users. Keyboards therefore may accept input in various orders and reorder
into the expected normalized form. This is also reflected in the "reorder"
feature of the LDML keyboard specification.

I suggest updating the glossary definition to match the definition in
section 2.2.

Date/Time: Thu Jan 3 11:36:17 CST 2019
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #389: FAQ/Collation (UCA and ISO 14651)


Please, change the answer to the following question on the Collation 
FAQ page (http://unicode.org/faq/collation.html#13):

	Q. What are the differences between the UCA and ISO 14651?

Since the last version of ISO 14651 [ISO/IEC 14651:2018 (5th ed.)], several points listed 
in the answer are no longer accurate. Please correct as follows: remove items #4, #6, 
and #7; and reword the item #5.


New answer
------------------

A. Very broadly, the UCA includes the following features that are not part of ISO 14651. 
This is only a sketch; for details see http://www.unicode.org/reports/tr10/.
• a much more thorough introduction to multilingual sorting issues
• much more information about performance and implementation practices
• how to apply collation to searching and matching
• a variable weighting option allowing punctuation to make a difference at the first three levels (“Non-ignorable” option)


Current answer
------------------

A. Very broadly, the UCA includes the following features that are not part of ISO 14651. 
This is only a sketch; for details see http://www.unicode.org/reports/tr10/.

• a much more thorough introduction to multilingual sorting issues
• much more information about performance and implementation practices
• how to apply collation to searching and matching
• uniform handling of canonical equivalents
• variable weighting (allowing punctuation to be ignored or not)
• irrelevant combining characters don't interfere with contractions
• well-formedness criteria for tables (disallowing tables that would produce peculiar results, e.g. where X and Y don't contract, X < Y and yet XY == YX)


Thank you.

Date/Time: Wed Jan 9 07:20:22 CST 2019
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Representative glyph for U+20A9 ₩ WON SIGN

I recommend that the representative glyph for U+20A9 ₩ WON SIGN in the
“Currency Symbols” block be changed to have only one crossbar. Virtually all
mainstream Korean fonts include a glyph for this character that has only one
crossbar. Microsoft’s Malgun Gothic, which is a Korean font, is one of a
very small number of outliers (and Microsofts hould correct this). Note that
the code chart for the “Halfwidth and Fullwidth Forms” block includes the
full-width form, U+FFE6 ₩ FULLWIDTH WON SIGN, whose representative glyph has
only one crossbar. I also recommend adding an annotation similar to that of
U+00A5 ¥ YEN SIGN:

glyph may have one or two crossbars, but the most common form in Korea has only one