Comments on Public Review Issues

L2/17-224

Comments on Public Review Issues
(May 07 - August 02, 2017)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 25, 2017, since the previous cumulative document was issued prior to UTC #151 (May 2017). Some items in the Table of Contents do not have feedback here.

Feedback to UTC / Encoding Proposals

Date/Time: Thu Jul 20 03:53:43 CDT 2017
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/17-229 Coin emoji

The proposal for a Coin emoji by Katie McLaughlin states in 3.A that there are
no known instances of such a pictograph in a messaging system. This is not
really the case.

https://github.com/Crissov/unicode-proposals/issues/137 

1. The less important and now defunct Japanese carrier E-Mobile / eAccess had
extended the original NTT Docomo emoji set with some custom ones, including a
Coin emoji numbered #266.

https://github.com/Crissov/unicode-proposals/issues/331 

2. Microsoft's instant messaging service known under several names through the
times (Windows/Microsoft/MSN/Live Messenger) had a Money pictograph for the
`(mo)` short code that showed stacks of coins.

https://github.com/Crissov/unicode-proposals/issues/256 

3. A proposed extension to the XMPP/Jabber protocol, XEP-0038, (which at least
Cisco implemented) described the short code `:money:` which was intended to
show a gold coin. All other pictographs of the core set can be represented by
standard emojis.

https://github.com/Crissov/unicode-proposals/issues/320 

4. The Facebook internal pictogram code `[[329136157130818]]` shows (showed?)
two coins. I'm not quite sure how and where that's used, though.

https://github.com/Crissov/unicode-proposals/issues/254 

5. In L2/06-272, Andreas Stötzner proposed several characters found in Public
Signage. Many of them were unified with emojis which were proposed at about
the same time. At U+xx49, he proposed a symbol for Money Exchange that shows a
banknote and coins. It probably became U+1F4B1 Currency Exchange which only
shows two specific currency symbols.

https://github.com/Crissov/unicode-proposals/issues/299 

In addition, a Coin emoji would be welcome by card players who do not use the
modern international standard suits (clubs, spades, hearts, diamonds), but
more traditional ones where ♦ is represented by a gold coin. Some other
symbols are still missing for this, though.

https://github.com/Crissov/unicode-proposals/issues/289 

In conclusion, there are many compatibility reasons to add a Coin emoji to Unicode.

Date/Time: Thu Jul 20 16:59:10 CDT 2017
Name: Cibu Johny
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on L2/L2017/17207-malayalam-candrakkala.pdf

Here are my thoughts on the document:

1. The sources and ages of the palm leaf manuscript scans are not indicated.
Dating of such manuscripts can be harder compared to published books - they
can be 500 years old or just 50 years old. For example, it is well known that
EE VOWEL SIGN is a very recent phenomena in Malayalam script. I see its
presence in second line of the third palm leaf. Also palm leaves can indicate
just a small group's or locality's or even a single person's writing
conventions as opposed to that of a wider community in case of printing.
Essentially we should be extra careful about the evidences from palm leaves.

2. None of the scans provided are using Malayalam script - they are from
neighbouring scripts like Grantha, Tilagiri etc. Without the evidences from
Malayalam documents, I don't know how would we establish anything conclusive
about Malayalam. At the most, we can make an argument that there were usage of
Chandrakkala in neighbouring scripts.

3. The findings in the document does not seems to be from a peer reviewed
journal or widely accepted books. The text we included in the original
proposals for Virama's are (translated) from a peer reviewed journal. (We had
published our findings in that journal to make sure our claims are validated
by historians. Only after that, we had proposed the characters.)

Anyway, I don't have any objection to removing historical references to the
characters. I believe, those references are only just supplemental or
contextual information.

Regards,
Cibu

Date/Time: Mon Jul 24 18:52:27 CDT 2017
Name: Eduardo Marin
Report Type: Feedback on an Encoding Proposal
Opt Subject: Name of the paleohispanic script

The term Hispanic today, no longer means the entire Iberian Peninsula, like it
did in the days of the romans but now it may either refer to the nationality
of people from Spain or an ethnic group that is disthinguished by speaking
spanish, so naming it "hispanic" runs the risk of excluding Portugal by
accident. Because of that, it should be called Iberian instead. Furthermore if
we follow the naming conventions of other similar scripts like Old Italic a
much more suiting name would be "Old Iberian".

Date/Time: Wed Aug 2 12:05:18 CDT 2017
Name: Karl Williamson
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/17-197 (REPLACEMENT char quantity)

There is something wrong with the methodology the author used to determine the
behaviors of various software products.  I know this because he gives
incorrect results for Perl 5.  Perl 5 uses the original definition of UTF-8,
absorbing the maximal number of continuation bytes based on the number of
leading 1-bits in the start byte.  I just now made sure I am correct in my
assertion, by testing it on the sequence C0 80.  The UTF-8 decoding routine
treated this as a single, flawed, unit.  There must be some combination of
software pieces that led to the author’s results, but whatever they are, they
are masking Perl 5’s internal behavior

I apologize for the lateness of this response; but I thought it important enough 
to not shelve even if late.

Feedback on UTRs / UAXes

Date/Time: Sat Jan 28 14:49:18 CST 2017
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Misunderstandings of 'Logical Order'

(This feedback was left over from a previous period due to a clerical error.)

...
3)	The explanatory comment in UCD 9.0.0 IndicPositionalCategory.txt for value 
Visual_Order_Left says “... instead of the logical order model...”.  It should say, 
“... instead of the phonetic order model...”.

Date/Time: Fri May 12 15:49:11 CDT 2017
Name: Behnam Esfahbod
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback for current and proposed versions of UTS #46 Conformance Testing

Hi there,

We have faced a couple of issues with implementing the UTS #46 Conformance Testing 
for the rust-url library:


1) Neither Section 8 Conformance Testing nor the header of the data file (IdnaTest.txt) 
which flags need to be set for Section 4 Processing algorithms, specially VerifyDnsLength, 
or any of the proposed flags: CheckHyphens, CheckBidi, and CheckJoiners.


2) When VerifyDnsLength is not set, many cases fail, which are referring to Processing 
Step 4.2 of Section 4.2 ToASCII, meaning that VerifyDnsLength is expected to be set.

For example, line 169:
```
B;  。; [A4_2]; [A4_2]
```

(The current implementation of rust-url sets flag VerifyDnsLength because it results 
in a smaller failure rate for the test data.)


3) When VerifyDnsLength is set, there are unexpected failures in the test data for those 
cases with the source field starting with FULL STOP or a replacement character.

For example, line 4956:
```
B;  。\u0635\u0649\u05B0\u0644\u0627。岓\u0F84𝩋ᡂ;    \u0635\u0649\u05B0\u0644\u0627.岓\u0F84𝩋ᡂ;   xn--7cb2vlb7cxa.xn--3ed095b9x3dbd8t #   صىְلا.岓྄𝩋ᡂ
```

Starting with U+3002 IDEOGRAPHIC FULL STOP, during the Section 4.2 ToASCII algorithm, 
it should fail at step Processing 4.2, because of the first label having length zero. 
But no failure is anticipated in the data file.

The test data appears to be expecting dropping empty labels (or leading FULL STOPs) from 
the domain name (which would allow the test cases to pass), but there are’s no step under 
Section 4 Processing or Section 4.2 ToASCII regarding this behavior.


Please see these for original discussion and more info:
* https://github.com/servo/rust-url/issues/166
* https://github.com/servo/rust-url/issues/171

Date/Time: Wed May 24 05:38:06 CDT 2017
Name: Anne van Kesteren
Report Type: Public Review Issue
Opt Subject: Change UTS #46 IDNA processing_option to a boolean

Most other arguments to ToASCII and ToUnicode are a boolean. Since this has only 
two values, it would make sense to make it consistent. Now would be a good time 
to do it as well, since you're changing the calling convention anyway by introducing 
CheckHyphens et al.

Date/Time: Thu Jun 22 10:08:24 CDT 2017
Name: Ken Whistler
Report Type: Error Report
Opt Subject: UTS #10


There is a small error Appendix B of UTS #10 with regards to the CTT for ISO/IEC 14651. 
The statement:

The CTT for ISO/IEC 14651 is constructed using only symbols, rather than explicit integral 
weights, and with the Shift-Trimmed option for variable weighting.

should be amended to "the Shifted option" to be correct.

(submitted to track for the 11.0 proposed update of the document, on behalf of Marc 
Lodewijck, who noticed the problem)

Date/Time: Sat Jun 17 17:04:47 CDT 2017
Name: Domenic Denicola
Report Type: Public Review Issue
Opt Subject: UTS #46: why is processing_option an enum, not a boolean?

(Note: A reply was already sent to the submitter of this question; it is included here for UTC discussion.)

I help maintain a JavaScript library for implementing UTS #46. In the process
of revising our public API for the upcoming proposed revision
(http://www.unicode.org/reports/tr46/proposed.html#ToASCII), we noticed how
strange it is that all other inputs to ToASCII, besides the input string, are
booleans. Whereas processing_option is an enumeration with two values.

For editorial consistency, would it make sense to switch the processing option
to a boolean flag, e.g. UseTransitionalProcessing?

Date/Time: Thu Jun 29 06:44:45 CDT 2017
Name: Andrew West
Report Type: Error Report
Opt Subject: Incorrect Sort Keys in CollationTest_SHIFTED.txt

CollationTest_SHIFTED.txt for Unicode 10.0 has many incorrect sort keys, with
"FFFF" missing at level 4 (this affects lines with [among others] Ideographic
telegraphic symbols, CJK unified ideographs, CJK compatibility ideographs,
Tangut ideographs, private use characters, and non-characters). Just to give a
single example, the Unicode 9.0 file has:

3358 0021;	# (㍘) IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ZERO	[1C3D FB40 F0B9 | 0020 0020 | 0004 0004 | FFFF FFFF FFFF 0260 |]

But the Unicode 10.0 file has:

3358 0021;	# (㍘) IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ZERO	[1CA3 FB40 F0B9 | 0020 0020 | 0004 0004 | FFFF FFFF 0261 |]

9.0 and earlier versions of the file all have "FFFF FFFF FFFF 02XX", but the
10.0 file has "FFFF FFFF 0261", which I believe is incorrect as there has been
no change in the UCA to account for this difference (my implementation of the
UCA produces "FFFF FFFF FFFF 0261" but still passes the CollationTest_SHIFTED
test).

I understand that the sort keys in the comments are provided for information
only, but it is confusing for implementers when they are incorrect, so it
would be helpful if someone could check and correct them.

Date/Time: Fri Jul 7 15:46:31 CDT 2017
Name: Anonymous Four
Report Type: Error Report
Opt Subject: Suggested Resolution of error report

Regarding:

> > Date/Time: Thu Jun 29 06:44:45 CDT 2017
> > Name: Andrew West
> > Report Type: Error Report
> > Opt Subject: Incorrect Sort Keys in CollationTest_SHIFTED.txt

This change should be documented in the "migration" section of

http://www.unicode.org/Public/UCA/10.0.0/CollationTest.html 

Even though it applies only to a bug fix in the sort keys provided as
comments, for a test file one can expect implementers to verify their
implementations against such information.

Also, I don't understand (and am troubled by) the statement that an
implementation that generates the old-style keys does pass the test. If that's
possible, then explaining the bug and the change in the "migration" section
becomes more urgent.

(I assume that the .hmtl file is mentioned in some header of the test file so
that implementers or testers who are simply handed the test file can have a
chance to locate the needed info)

Date/Time: Tue Jul 18 18:15:43 CDT 2017
Name: Leroy Vargas-Oritz
Report Type: Error Report (UAX #45)
Opt Subject: UAX #45

UAX #45 (v10.0.0) and its data file USourceData.txt (v10.0.0) have the
following label for Field 1's value F: "Included in Extension F". However,
there are 96 ideographs with Field 1="F" that do not have an associated
codepoint; manually searching for these characters in the Radical-Stroke Index
yields no results - in other words, they are not encoded in Extension F or
anywhere else in Unicode. I contacted John H. Jenkins and this was his reply:

"These are characters which were included in the UTC's original submission of
Extension F candidates back in 2012, but which had to be withdrawn before
Extension F was finalized. The main reason for the withdrawl was a lack of
sufficient evidence; the IRG has made its rules for evidence stricter over
the years, and these characters did not meet the new standards.

The status of "F" should not have been changed to mean "Included in Extension
F." It should have been left as "Submitted for Extension F" with a note that
characters actually part of Extension F have their code point indicated."

Leroy Vargas

Error Reports

Date/Time: Sun Jul 2 09:15:24 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Shorthand format controls should not be default ignorable

The control characters in the Shorthand Format Controls block are required for
the proper spelling of words in Duployan, so they should not be default
ignorable. Ignoring them can make very different glyph arrangements look
identical.

This is especially problematic because, as far as I know, no font supports
Duployan. Those fonts that cover the Duployan block just have non-interacting
glyphs copied from the code chart, including dotted boxed glyphs for the
shorthand format controls. Renderers that heed Default_Ignorable_Code_Point
ignore the controls, but seeing the dotted boxed glyphs is necessary for a
human reader to understand what the rendering should have been.

But even a system that can render Duployan properly shouldn’t ignore these
controls. There should always be some visible fallback.

Date/Time: Tue Jul 18 08:07:49 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Typo in UTS #51

In section 7, “black right-pointing double triangle with vertical bar” is not 
the name of U+23EF. The word “double” is in the wrong place. It should be 
“black right-pointing triangle with double vertical bar”.

UTC has reviewed feedback above this point as of August 4, 2017.

Date/Time: Fri Jul 21 08:16:30 CDT 2017
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: BidiMirroring.txt Contains Wrong Data and Encourages Lazy Implementations

Hello again,

Iʼve just found out that BidiMirroring.txt is unreliable as it does not list a
number of paired characters such as U+2220 and U+29A3, U+2221 and U+299B, and
so on. Many browsers (at least Chrome, Firefox and Opera) only rely on glyph-
exchange based bidi-mirroring, and so many mathematical symbols are not
mirrored while they could be, whereas there remain some unpaired glyphs that
are unable to be mirrored this way. Hence I wonder whether implementors
shouldnʼt have been urged to properly implement bidi-mirroring from the
beginning on, rather than being encouraged to hack the Unicode repertoire for
a shallow emulation of the feature, that fails as soon as it comes to higher
mathematics. And that fails sooner as it would have been necessary at this
stage.

On the other hand, Iʼve found a vertically symetric symbol, U+29A1 SPHERICAL
ANGLE OPENING UP, that has the Bidi_Mirrored_Glyph property value Yes. I
suspect that this is a mistake, although in practice this one is harmless.

BTW it seems that U+299B MEASURED ANGLE OPENING LEFT should have been called
REVERSED MEASURED ANGLE, as shown in other instances and in accordance of its
precedent U+2221 MEASURED ANGLE. The sama applies to U+29A0 SPHERICAL ANGLE
OPENING LEFT, that is actually a REVERSED SHERICAL ANGLE, Would it be possible
to add aliases to those symbols? That would also fix its name wrt bidi-
mirroring.

Thinking at plain text editors (given that in browsers, properly implementing
bidi-mirroring is as simple as implementing CSS transformations), there must
be a lack of knowledge about Unicode or some specifications are missing, since
some programs stack a maximum number of two diacritics, while others stack
unlimited piles of them. Also, some of them cast unattached combining marks
forth, where they end up on top of the following letter, while others cast
them back as specified.

In the wake of these findings, Iʼm now trying to understand why many
implementors are not respectful enough against Unicode to do proper work. That
leads me back to the image damages that inevitably result from flaws
and—sometimes deliberate—misnomers and other mistakes. That is really a pity
and I think that if the Standard had been designed in a more straightforward
manner, without compromises, it could have triggered an even stronger dynamic
in which it would have been implemented sooner, faster, and more thoroughly.

I believe that correcting some policies in this 25-years-of-Unicode
commemoration year would be doing much good.

Good luck,

Marcel

Date/Time: Mon Jul 24 20:48:29 CDT 2017
Name: Andrew M
Report Type: Error Report
Opt Subject: U+20E3: Emoji_Component

U+20E3 (COMBINING ENCLOSING KEYCAP) should be listed in emoji-data.txt as an
Emoji_Component since it is, in fact, an emoji component, as demonstrated by
the fact that it occurs in emoji-sequences.txt.

Thanks,
Andrew

Date/Time: Wed Jul 26 18:34:59 CDT 2017
Name: Adam Borowski
Report Type: Error Report
Opt Subject: apparent errors in EastAsianWidth (for char-cell terminals)

These days, EastAsianWidth has lost most of its purpose, prior uses such as
handling of legacy encodings being no longer relevant -- except for use in
terminals, usually wrapped as wcwidth().

Terminals provide a character cell grid display, and require an unambiguous
mapping between a character sequence and an integer number of cells occupied
by such a sequence.  Usually, this mapping is done statelessly based on the
given character's properties:
* a control character is not represented on the grid
* a combining or non-spacing character has width 0
* EastAsianWide (F and W) characters have width 2
* everything else, including EastAsianAmbiguous, has width 1

Recommendations from the Unicode standard that suggest different handling
of text presumed to come from a CJK locale, or unpredictable handling of
variants such as emoji-or-text presentation, are generally impossible to
implement.  Unlike a web browser or a document processor where formatting
text and placing glyphs on the "paper" (be it a display, a PDF, etc) is done
by a single program, for a terminal such communication is impossible as
formatting and display are done by separate programs, often on physically
separate machines under control of different operating systems.  In some
cases, such as over a serial console, there's even no out-of-band link at
all.

With EastAsianWidth's diminished importance, I see that most characters
added in recent years have quite puzzling values as if not much heed has
been paid towards this database anymore.

The most striking example is U+1F0CF PLAYING CARD BLACK JOKER: it has width
2, while every other character in that range, including U+1F0DF PLAYING CARD
WHITE JOKER, has width 1.  This comes from that glyph having an alternate
emoji presentation.  Playing cards seem to work better with narrow glyphs.

Likewise, U+1F004 MAHJONG TILE RED DRAGON has width 2, unlike all the rest
of that block, which have width 1.  In this case, I'd argue that marking the
whole block as wide would be better.

Blocks commonly understood to be all emoji (U+1F300..U+1F5FF Miscellaneous
Symbols and Pictographs, U+1F680..U+1F6FF Transport and Map Symbols) are a
mess, with a jumble of widths 1 and 2.  Because of reference glyph shapes,
font makers have universally ignored the distinction, drawing all glyphs
with width 2 (at least, I have yet to see a font with narrow proportions). 
This leads to _some_ characters taking most of the next character cell on
text terminals.  The obvious solution would be to decree that the entirety
of that range has EastAsianWidth:W, just like it was done with the block
U+1F600..U+1F64F Emoticons.

Thus I wonder what would be the best way to fix the above problems:
* edit the EastAsianWidth database, probably adding a category "has an emoji
  presentation but should be text on fixed-width displays"?
* declare such use to be abuse, and provide a whole new table meant to be
  used by char-cell terminals?

If the former is the way to go, I'd recommend marking, as discussed above,
the Mahjongg Tiles, Miscellaneous Symbols and Pictograms, Transport and Map
Symbols blocks as all wide.  Some other cases might be more tricky.

Would it be helpful if I went through all of characters and proposed my
(naive) recommendations?

Other Reports

Date/Time: Sat Jan 28 22:08:15 CST 2017
Name: Richard Wordingham
Report Type: Other Question, Problem, or Feedback
Opt Subject: Indic_Positional_Category: More Matras with Variable Placement

(This feedback was left over from a previous period due to a clerical error.)

The Tamil, Malayalam and Grantha vowel signs U and UU are not the only ones
with variable placement.  There are another two pairs with contextually
variable placement, but in their cases with a usual placement of Bottom (as
documented) and an exceptional placement of Right, in which case they have
spacing glyphs.  (This is different to the three Indian pairs already
documented in the comments of IndicPositionalCategory.txt – U+0BC1-2, U+0D41-2
and U+11341-2.)

They are:
U+102F MYANMAR VOWEL SIGN U,
U+1030 MYANMAR VOWEL SIGN UU,
U+1A69 TAI THAM VOWEL SIGN U, and
U+1A6A TAI THAM VOWEL SIGN UU.

The latter pair have variable placement in modern Tai Khün, but not, so far as 
I am aware, in Northern Thai.

These two pairs should be similarly commented on, either explicitly or implicitly.

Date/Time: Wed Jun 14 12:56:33 CDT 2017
Name: Eduardo Marin Silva
Report Type: Public Review Issue
Opt Subject: Bitcoin informative note

It wouldn't hurt to add to the codechart an informative note for 
the bitcoin sign saying "cryptocurrency" since all other signs 
specify their use.

Date/Time: Mon May 8 23:30:27 CDT 2017
Name: Shriramana Sharma
Report Type: Other Question, Problem, or Feedback
Opt Subject: Dotted box for Brahmi/Kannada fricative characters

The Sharada chart currently shows 111C2 and 111C3 with a dotted box to
indicate requirement of special rendering. The same should be applied to
Kannada 0CF1 and 0CF2 and also to Brahmi 11003 and 11004. This was already
requested in L2/14-066 and this is just a "gentle" reminder three years later!
☺️

The same should be applied to the TUS 9.0 description of these characters in
the Kannada chapter 12.8 p 498 (534 of PDF). Note that the change does not
conflict with L2/17-098 p 6 which asks for a different correction here.

The corresponding Brahmi chapter 14.1 p 553 (589 of PDF) under Vowel Modifiers
doesn't give written representations so this is not an issue there.

Date/Time: Thu Jun 29 21:43:08 CDT 2017
Name: Henry Chan
Report Type: Other Question, Problem, or Feedback
Opt Subject: Clarification on multi-column CJK Charts

The accuracy of the representative glyphs for representing the regional
preferences in the CJK multi-column Code Charts has been called into question
multiple times.

Even though it is specified in Unicode Standard Chapter 24:

> Each character in these code charts is shown with a representative glyph. A
> representative glyph is not a prescriptive form of the character, but rather
> one that enables recognition of the intended character to a knowledgeable
> user and facilitates lookup of the character in the code charts. In many
> cases, there are more or less well-established alternative glyphic
> representations for the same character.

> Designers of high-quality fonts will do their own research into the
> preferred glyphic appearance of Unicode characters. In addition, many scripts
> require context-dependent glyph shaping, glyph positioning, or ligatures, none
> of which is shown in the code charts. The Unicode Standard contains many
> characters that are used in writing minority languages or that are historical
> characters, often used primarily in manuscripts or inscriptions. Where there
> is no strong tradition of printed materials, the typography of a character may
> not be settled. Because of these factors, the glyph image chosen as the
> representative glyph in these code charts should not be considered a
> definitive guide to best practice for typographical design.

It may be better to do the following in the East Asian Chapter:

(1) repeat that "A representative glyph is not a prescriptive form of the character, 
but rather one that enables recognition of the intended character to a knowledgeable 
user and facilitates lookup of the character in the code charts."

(2) specify that "The representative glyphs for each locale/column in the CJK Unified 
Ideographs are not necessarily the preferred or normative glyphs per each region",

(3) iterate that "Other more or less well-established alternative glyphic representations may exist".

This will more accurately reflect Korea's and Hong Kong's current situation where the 
fonts supplied are not necessarily in conformance to national standards, and that 
users' conventions are not necessarily well defined or in alliance with the national standards.

It will also cover the case where China's two glyphs in Extension B don't match the 
Tongyong Guifan Hanzi Biao (TGH), which China NB tends not to correct, as well as the 
unknown normality for the 8 or 9 variant characters in TGH that have a unifiable-but-
markedly-different glyph in GE sources.

It will also be consistent with Taiwan's practice that the characters in T3 and above 
are not normalized, and also agrees with the fact that only supporting the base character 
at its current Unicode J-column sources is definitely not enough to cater for the modern 
preferences for Japanese users.

(Submitted to the reporting form on the advice of Ken Lunde)

Date/Time: Tue Jul 4 16:08:16 CDT 2017
Name: Johannes Athmer
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Updated German ortography concerning ß/ẞ

In the Unicode FAQ (and case mapping information), it is stated that ß (U+00DF
LATIN SMALL LETTER SHARP S) should not be encoded to the upper case ẞ (U+1E9E
LATIN CAPITAL LETTER SHARP S) by default. It should be noted that the official
German ortography has been changed[1] to include U+1E9E as the capital version
of U+00DF. I would suggest to moving from mapping lower case "ß" to upper case
"SS" by default and introducing a mapping of lower case "ß" to upper case "ẞ"
instead. Not only is this now the official spelling, but it is also lossless.
It's impossible (without knowing the vocabulary - and sometimes the context!)
to transform upper case "SS" to lower case "ß" or lower case "ss".

[1] http://www.rechtschreibrat.com/DOX/rfdr_PM_2017-06-29_Aktualisierung_Regelwerk.pdf

Date/Time: Sat Jul 15 15:55:53 CDT 2017
Report Type: Error Report
Opt Subject: BidiCharacterTest.txt

Hello,

I'm trying to pass all the tests in BidiCharacterTest.txt , and I'm having
problem understanding a few of the tests that to me appear to contradict the
specification. The problematic lines in BidiCharacterTest-10.0.0.txt are the
tests on lines 262, 263, and 264.

Let's consider test from line 262:

Dir: RTL
Input:   a ( b <RLE> c <pdf> ) _ 1
Level:   2 2 2 x 4 x 1 1 2

The problem I'm having is that the first opening bracket is assigned level 2
and the closing bracket level 1.

This seems to contradict the three rules N0.b, N0.c.1, and N0.c.2 in the
specification that all describe overriding the type of both brackets with
either the matching or the opposite direction. The only case we can possibly
get different levels (correct me if I'm wrong!) is if rule N0.d is applied and
the brackets retain their neutral status until it is resolved in subsequent
rules.

I would very much appreciate if you would either acknowledge a bug or correct
a misunderstanding on my part.

Thank you in advance!
Dov

Date/Time: Fri Jul 21 04:03:11 CDT 2017
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Code Charts Are Lacking Hints About General Category

One of the main impeachments that is compromising Unicode education (beside
the already discussed lack of a descriptor property—or the hijacking of what
should have been the descriptor, as an alphanumeric identifier for the
convenience of those users of the Standard who hate hex digits—and the
insufficient truthfulness of many parts of the Standard) seems to be the lack
of General_Category and Bidi_Mirrored property value hints in the Code Charts.

Although the disclaimer clearly states the necessity of looking up the Unicode
Standard and TRs, Unicode *education* isnʼt meant to compile all the
information needed “for a successful implementation” to work out content. The
Gc and bidi mirroring happen to be indispensable for an average understanding
of characters. Believe it or not: even Unicode experts failed to know about
bidi-mirroring of angle quotation marks. Such shortcomings wouldnʼt happen if
the Gc and BM were added in the Code Charts. That can be done without
overloading the layout, by simply appending an 'M' to the two-letter code of
the Gc if applicable, and then add this code after the code point.

Thank you in advance.

Regards,

Marcel

Date/Time: Fri Jul 21 04:16:12 CDT 2017
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: Code Charts Are Lacking Hints About General Category

When talking about the drawbacks that hinder Unicode education, Iʼd recall
also the already reported singular instead of a plural in the title of TUS:
Core Specification[s].

See:

https://forum.wordreference.com/threads/specifications-or-specification.2611680/ 

This adds to a number of other flaws as already discussed elsewhere, to lower
the enthusiasm of scholars. Not correcting it is likely overstating the
stability policies.

The only argument I can see for keeping the singular is that turning it to
plural is admitting a mistake in such a prominent place where many readsers
will notice it when opening the new version for the first time. If there is
another one, please let me know.

Regards,
Marcel

Issue	Name	Feedback Link
353	Feedback on draft additional repertoire for Amendment 2 (PDAM) to ISO/IEC 10646:2017 (5th edition)	(feedback
352	Feedback on draft additional repertoire for Amendment 1.3 (PDAM) to ISO/IEC 10646:2017 (5th edition)	(feedback

L2/17-224