The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 25, 2017, since the previous cumulative document was issued prior to UTC #151 (May 2017). Some items in the Table of Contents do not have feedback here.
The links below go directly to open PRIs and to feedback documents for them, as of July 25, 2017.
The links below go to locations in this document for feedback.
Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports
Note: The section of Feedback on Encoding Proposals this time includes:
L2/06-272
L2/14-066
L2/17-098
L2/17-197
L2/17-207
L2/17-229
Date/Time: Thu Jul 20 03:53:43 CDT 2017
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/17-229 Coin emoji
The proposal for a Coin emoji by Katie McLaughlin states in 3.A that there are no known instances of such a pictograph in a messaging system. This is not really the case. https://github.com/Crissov/unicode-proposals/issues/137 1. The less important and now defunct Japanese carrier E-Mobile / eAccess had extended the original NTT Docomo emoji set with some custom ones, including a Coin emoji numbered #266. https://github.com/Crissov/unicode-proposals/issues/331 2. Microsoft's instant messaging service known under several names through the times (Windows/Microsoft/MSN/Live Messenger) had a Money pictograph for the `(mo)` short code that showed stacks of coins. https://github.com/Crissov/unicode-proposals/issues/256 3. A proposed extension to the XMPP/Jabber protocol, XEP-0038, (which at least Cisco implemented) described the short code `:money:` which was intended to show a gold coin. All other pictographs of the core set can be represented by standard emojis. https://github.com/Crissov/unicode-proposals/issues/320 4. The Facebook internal pictogram code `[[329136157130818]]` shows (showed?) two coins. I'm not quite sure how and where that's used, though. https://github.com/Crissov/unicode-proposals/issues/254 5. In L2/06-272, Andreas Stötzner proposed several characters found in Public Signage. Many of them were unified with emojis which were proposed at about the same time. At U+xx49, he proposed a symbol for Money Exchange that shows a banknote and coins. It probably became U+1F4B1 Currency Exchange which only shows two specific currency symbols. https://github.com/Crissov/unicode-proposals/issues/299 In addition, a Coin emoji would be welcome by card players who do not use the modern international standard suits (clubs, spades, hearts, diamonds), but more traditional ones where ♦ is represented by a gold coin. Some other symbols are still missing for this, though. https://github.com/Crissov/unicode-proposals/issues/289 In conclusion, there are many compatibility reasons to add a Coin emoji to Unicode.
Date/Time: Thu Jul 20 16:59:10 CDT 2017
Name: Cibu Johny
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on L2/L2017/17207-malayalam-candrakkala.pdf
Here are my thoughts on the document: 1. The sources and ages of the palm leaf manuscript scans are not indicated. Dating of such manuscripts can be harder compared to published books - they can be 500 years old or just 50 years old. For example, it is well known that EE VOWEL SIGN is a very recent phenomena in Malayalam script. I see its presence in second line of the third palm leaf. Also palm leaves can indicate just a small group's or locality's or even a single person's writing conventions as opposed to that of a wider community in case of printing. Essentially we should be extra careful about the evidences from palm leaves. 2. None of the scans provided are using Malayalam script - they are from neighbouring scripts like Grantha, Tilagiri etc. Without the evidences from Malayalam documents, I don't know how would we establish anything conclusive about Malayalam. At the most, we can make an argument that there were usage of Chandrakkala in neighbouring scripts. 3. The findings in the document does not seems to be from a peer reviewed journal or widely accepted books. The text we included in the original proposals for Virama's are (translated) from a peer reviewed journal. (We had published our findings in that journal to make sure our claims are validated by historians. Only after that, we had proposed the characters.) Anyway, I don't have any objection to removing historical references to the characters. I believe, those references are only just supplemental or contextual information. Regards, Cibu
Date/Time: Mon Jul 24 18:52:27 CDT 2017
Name: Eduardo Marin
Report Type: Feedback on an Encoding Proposal
Opt Subject: Name of the paleohispanic script
The term Hispanic today, no longer means the entire Iberian Peninsula, like it did in the days of the romans but now it may either refer to the nationality of people from Spain or an ethnic group that is disthinguished by speaking spanish, so naming it "hispanic" runs the risk of excluding Portugal by accident. Because of that, it should be called Iberian instead. Furthermore if we follow the naming conventions of other similar scripts like Old Italic a much more suiting name would be "Old Iberian".
Date/Time: Wed Aug 2 12:05:18 CDT 2017
Name: Karl Williamson
Report Type: Feedback on an Encoding Proposal
Opt Subject: Comments on L2/17-197 (REPLACEMENT char quantity)
There is something wrong with the methodology the author used to determine the behaviors of various software products. I know this because he gives incorrect results for Perl 5. Perl 5 uses the original definition of UTF-8, absorbing the maximal number of continuation bytes based on the number of leading 1-bits in the start byte. I just now made sure I am correct in my assertion, by testing it on the sequence C0 80. The UTF-8 decoding routine treated this as a single, flawed, unit. There must be some combination of software pieces that led to the author’s results, but whatever they are, they are masking Perl 5’s internal behavior I apologize for the lateness of this response; but I thought it important enough to not shelve even if late.
Date/Time: Sat Jan 28 14:49:18 CST 2017
Name: Richard Wordingham
Report Type: Error Report
Opt Subject: Misunderstandings of 'Logical Order'
(This feedback was left over from a previous period due to a clerical error.)
... 3) The explanatory comment in UCD 9.0.0 IndicPositionalCategory.txt for value Visual_Order_Left says “... instead of the logical order model...”. It should say, “... instead of the phonetic order model...”.
Date/Time: Fri May 12 15:49:11 CDT 2017
Name: Behnam Esfahbod
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback for current and proposed versions of UTS #46 Conformance Testing
Hi there, We have faced a couple of issues with implementing the UTS #46 Conformance Testing for the rust-url library: 1) Neither Section 8 Conformance Testing nor the header of the data file (IdnaTest.txt) which flags need to be set for Section 4 Processing algorithms, specially VerifyDnsLength, or any of the proposed flags: CheckHyphens, CheckBidi, and CheckJoiners. 2) When VerifyDnsLength is not set, many cases fail, which are referring to Processing Step 4.2 of Section 4.2 ToASCII, meaning that VerifyDnsLength is expected to be set. For example, line 169: ``` B; 。; [A4_2]; [A4_2] ``` (The current implementation of rust-url sets flag VerifyDnsLength because it results in a smaller failure rate for the test data.) 3) When VerifyDnsLength is set, there are unexpected failures in the test data for those cases with the source field starting with FULL STOP or a replacement character. For example, line 4956: ``` B; 。\u0635\u0649\u05B0\u0644\u0627。岓\u0F84𝩋ᡂ; \u0635\u0649\u05B0\u0644\u0627.岓\u0F84𝩋ᡂ; xn--7cb2vlb7cxa.xn--3ed095b9x3dbd8t # صىְلا.岓྄𝩋ᡂ ``` Starting with U+3002 IDEOGRAPHIC FULL STOP, during the Section 4.2 ToASCII algorithm, it should fail at step Processing 4.2, because of the first label having length zero. But no failure is anticipated in the data file. The test data appears to be expecting dropping empty labels (or leading FULL STOPs) from the domain name (which would allow the test cases to pass), but there are’s no step under Section 4 Processing or Section 4.2 ToASCII regarding this behavior. Please see these for original discussion and more info: * https://github.com/servo/rust-url/issues/166 * https://github.com/servo/rust-url/issues/171
Date/Time: Wed May 24 05:38:06 CDT 2017
Name: Anne van Kesteren
Report Type: Public Review Issue
Opt Subject: Change UTS #46 IDNA processing_option to a boolean
Most other arguments to ToASCII and ToUnicode are a boolean. Since this has only two values, it would make sense to make it consistent. Now would be a good time to do it as well, since you're changing the calling convention anyway by introducing CheckHyphens et al.
Date/Time: Thu Jun 22 10:08:24 CDT 2017
Name: Ken Whistler
Report Type: Error Report
Opt Subject: UTS #10
There is a small error Appendix B of UTS #10 with regards to the CTT for ISO/IEC 14651. The statement: The CTT for ISO/IEC 14651 is constructed using only symbols, rather than explicit integral weights, and with the Shift-Trimmed option for variable weighting. should be amended to "the Shifted option" to be correct. (submitted to track for the 11.0 proposed update of the document, on behalf of Marc Lodewijck, who noticed the problem)
Date/Time: Sat Jun 17 17:04:47 CDT 2017
Name: Domenic Denicola
Report Type: Public Review Issue
Opt Subject: UTS #46: why is processing_option an enum, not a boolean?
(Note: A reply was already sent to the submitter of this question; it is included here for UTC discussion.)
I help maintain a JavaScript library for implementing UTS #46. In the process of revising our public API for the upcoming proposed revision (http://www.unicode.org/reports/tr46/proposed.html#ToASCII), we noticed how strange it is that all other inputs to ToASCII, besides the input string, are booleans. Whereas processing_option is an enumeration with two values. For editorial consistency, would it make sense to switch the processing option to a boolean flag, e.g. UseTransitionalProcessing?
Date/Time: Thu Jun 29 06:44:45 CDT 2017
Name: Andrew West
Report Type: Error Report
Opt Subject: Incorrect Sort Keys in CollationTest_SHIFTED.txt
CollationTest_SHIFTED.txt for Unicode 10.0 has many incorrect sort keys, with "FFFF" missing at level 4 (this affects lines with [among others] Ideographic telegraphic symbols, CJK unified ideographs, CJK compatibility ideographs, Tangut ideographs, private use characters, and non-characters). Just to give a single example, the Unicode 9.0 file has: 3358 0021; # (㍘) IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ZERO [1C3D FB40 F0B9 | 0020 0020 | 0004 0004 | FFFF FFFF FFFF 0260 |] But the Unicode 10.0 file has: 3358 0021; # (㍘) IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ZERO [1CA3 FB40 F0B9 | 0020 0020 | 0004 0004 | FFFF FFFF 0261 |] 9.0 and earlier versions of the file all have "FFFF FFFF FFFF 02XX", but the 10.0 file has "FFFF FFFF 0261", which I believe is incorrect as there has been no change in the UCA to account for this difference (my implementation of the UCA produces "FFFF FFFF FFFF 0261" but still passes the CollationTest_SHIFTED test). I understand that the sort keys in the comments are provided for information only, but it is confusing for implementers when they are incorrect, so it would be helpful if someone could check and correct them.
Date/Time: Fri Jul 7 15:46:31 CDT 2017
Name: Anonymous Four
Report Type: Error Report
Opt Subject: Suggested Resolution of error report
Regarding: > > Date/Time: Thu Jun 29 06:44:45 CDT 2017 > > Name: Andrew West > > Report Type: Error Report > > Opt Subject: Incorrect Sort Keys in CollationTest_SHIFTED.txt This change should be documented in the "migration" section of http://www.unicode.org/Public/UCA/10.0.0/CollationTest.html Even though it applies only to a bug fix in the sort keys provided as comments, for a test file one can expect implementers to verify their implementations against such information. Also, I don't understand (and am troubled by) the statement that an implementation that generates the old-style keys does pass the test. If that's possible, then explaining the bug and the change in the "migration" section becomes more urgent. (I assume that the .hmtl file is mentioned in some header of the test file so that implementers or testers who are simply handed the test file can have a chance to locate the needed info)
Date/Time: Tue Jul 18 18:15:43 CDT 2017
Name: Leroy Vargas-Oritz
Report Type: Error Report (UAX #45)
Opt Subject: UAX #45
UAX #45 (v10.0.0) and its data file USourceData.txt (v10.0.0) have the following label for Field 1's value F: "Included in Extension F". However, there are 96 ideographs with Field 1="F" that do not have an associated codepoint; manually searching for these characters in the Radical-Stroke Index yields no results - in other words, they are not encoded in Extension F or anywhere else in Unicode. I contacted John H. Jenkins and this was his reply: "These are characters which were included in the UTC's original submission of Extension F candidates back in 2012, but which had to be withdrawn before Extension F was finalized. The main reason for the withdrawl was a lack of sufficient evidence; the IRG has made its rules for evidence stricter over the years, and these characters did not meet the new standards. The status of "F" should not have been changed to mean "Included in Extension F." It should have been left as "Submitted for Extension F" with a note that characters actually part of Extension F have their code point indicated." Leroy Vargas
Date/Time: Sun Jul 2 09:15:24 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Shorthand format controls should not be default ignorable
The control characters in the Shorthand Format Controls block are required for the proper spelling of words in Duployan, so they should not be default ignorable. Ignoring them can make very different glyph arrangements look identical. This is especially problematic because, as far as I know, no font supports Duployan. Those fonts that cover the Duployan block just have non-interacting glyphs copied from the code chart, including dotted boxed glyphs for the shorthand format controls. Renderers that heed Default_Ignorable_Code_Point ignore the controls, but seeing the dotted boxed glyphs is necessary for a human reader to understand what the rendering should have been. But even a system that can render Duployan properly shouldn’t ignore these controls. There should always be some visible fallback.
Date/Time: Tue Jul 18 08:07:49 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Typo in UTS #51
In section 7, “black right-pointing double triangle with vertical bar” is not the name of U+23EF. The word “double” is in the wrong place. It should be “black right-pointing triangle with double vertical bar”.
UTC has reviewed feedback above this point as of August 4, 2017.
Date/Time: Fri Jul 21 08:16:30 CDT 2017
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: BidiMirroring.txt Contains Wrong Data and Encourages Lazy Implementations
Hello again, Iʼve just found out that BidiMirroring.txt is unreliable as it does not list a number of paired characters such as U+2220 and U+29A3, U+2221 and U+299B, and so on. Many browsers (at least Chrome, Firefox and Opera) only rely on glyph- exchange based bidi-mirroring, and so many mathematical symbols are not mirrored while they could be, whereas there remain some unpaired glyphs that are unable to be mirrored this way. Hence I wonder whether implementors shouldnʼt have been urged to properly implement bidi-mirroring from the beginning on, rather than being encouraged to hack the Unicode repertoire for a shallow emulation of the feature, that fails as soon as it comes to higher mathematics. And that fails sooner as it would have been necessary at this stage. On the other hand, Iʼve found a vertically symetric symbol, U+29A1 SPHERICAL ANGLE OPENING UP, that has the Bidi_Mirrored_Glyph property value Yes. I suspect that this is a mistake, although in practice this one is harmless. BTW it seems that U+299B MEASURED ANGLE OPENING LEFT should have been called REVERSED MEASURED ANGLE, as shown in other instances and in accordance of its precedent U+2221 MEASURED ANGLE. The sama applies to U+29A0 SPHERICAL ANGLE OPENING LEFT, that is actually a REVERSED SHERICAL ANGLE, Would it be possible to add aliases to those symbols? That would also fix its name wrt bidi- mirroring. Thinking at plain text editors (given that in browsers, properly implementing bidi-mirroring is as simple as implementing CSS transformations), there must be a lack of knowledge about Unicode or some specifications are missing, since some programs stack a maximum number of two diacritics, while others stack unlimited piles of them. Also, some of them cast unattached combining marks forth, where they end up on top of the following letter, while others cast them back as specified. In the wake of these findings, Iʼm now trying to understand why many implementors are not respectful enough against Unicode to do proper work. That leads me back to the image damages that inevitably result from flaws and—sometimes deliberate—misnomers and other mistakes. That is really a pity and I think that if the Standard had been designed in a more straightforward manner, without compromises, it could have triggered an even stronger dynamic in which it would have been implemented sooner, faster, and more thoroughly. I believe that correcting some policies in this 25-years-of-Unicode commemoration year would be doing much good. Good luck, Marcel
Date/Time: Mon Jul 24 20:48:29 CDT 2017
Name: Andrew M
Report Type: Error Report
Opt Subject: U+20E3: Emoji_Component
U+20E3 (COMBINING ENCLOSING KEYCAP) should be listed in emoji-data.txt as an Emoji_Component since it is, in fact, an emoji component, as demonstrated by the fact that it occurs in emoji-sequences.txt. Thanks, Andrew
Date/Time: Wed Jul 26 18:34:59 CDT 2017
Name: Adam Borowski
Report Type: Error Report
Opt Subject: apparent errors in EastAsianWidth (for char-cell terminals)
These days, EastAsianWidth has lost most of its purpose, prior uses such as handling of legacy encodings being no longer relevant -- except for use in terminals, usually wrapped as wcwidth(). Terminals provide a character cell grid display, and require an unambiguous mapping between a character sequence and an integer number of cells occupied by such a sequence. Usually, this mapping is done statelessly based on the given character's properties: * a control character is not represented on the grid * a combining or non-spacing character has width 0 * EastAsianWide (F and W) characters have width 2 * everything else, including EastAsianAmbiguous, has width 1 Recommendations from the Unicode standard that suggest different handling of text presumed to come from a CJK locale, or unpredictable handling of variants such as emoji-or-text presentation, are generally impossible to implement. Unlike a web browser or a document processor where formatting text and placing glyphs on the "paper" (be it a display, a PDF, etc) is done by a single program, for a terminal such communication is impossible as formatting and display are done by separate programs, often on physically separate machines under control of different operating systems. In some cases, such as over a serial console, there's even no out-of-band link at all. With EastAsianWidth's diminished importance, I see that most characters added in recent years have quite puzzling values as if not much heed has been paid towards this database anymore. The most striking example is U+1F0CF PLAYING CARD BLACK JOKER: it has width 2, while every other character in that range, including U+1F0DF PLAYING CARD WHITE JOKER, has width 1. This comes from that glyph having an alternate emoji presentation. Playing cards seem to work better with narrow glyphs. Likewise, U+1F004 MAHJONG TILE RED DRAGON has width 2, unlike all the rest of that block, which have width 1. In this case, I'd argue that marking the whole block as wide would be better. Blocks commonly understood to be all emoji (U+1F300..U+1F5FF Miscellaneous Symbols and Pictographs, U+1F680..U+1F6FF Transport and Map Symbols) are a mess, with a jumble of widths 1 and 2. Because of reference glyph shapes, font makers have universally ignored the distinction, drawing all glyphs with width 2 (at least, I have yet to see a font with narrow proportions). This leads to _some_ characters taking most of the next character cell on text terminals. The obvious solution would be to decree that the entirety of that range has EastAsianWidth:W, just like it was done with the block U+1F600..U+1F64F Emoticons. Thus I wonder what would be the best way to fix the above problems: * edit the EastAsianWidth database, probably adding a category "has an emoji presentation but should be text on fixed-width displays"? * declare such use to be abuse, and provide a whole new table meant to be used by char-cell terminals? If the former is the way to go, I'd recommend marking, as discussed above, the Mahjongg Tiles, Miscellaneous Symbols and Pictograms, Transport and Map Symbols blocks as all wide. Some other cases might be more tricky. Would it be helpful if I went through all of characters and proposed my (naive) recommendations?
Date/Time: Sat Jan 28 22:08:15 CST 2017
Name: Richard Wordingham
Report Type: Other Question, Problem, or Feedback
Opt Subject: Indic_Positional_Category: More Matras with Variable Placement
(This feedback was left over from a previous period due to a clerical error.)
The Tamil, Malayalam and Grantha vowel signs U and UU are not the only ones with variable placement. There are another two pairs with contextually variable placement, but in their cases with a usual placement of Bottom (as documented) and an exceptional placement of Right, in which case they have spacing glyphs. (This is different to the three Indian pairs already documented in the comments of IndicPositionalCategory.txt – U+0BC1-2, U+0D41-2 and U+11341-2.) They are: U+102F MYANMAR VOWEL SIGN U, U+1030 MYANMAR VOWEL SIGN UU, U+1A69 TAI THAM VOWEL SIGN U, and U+1A6A TAI THAM VOWEL SIGN UU. The latter pair have variable placement in modern Tai Khün, but not, so far as I am aware, in Northern Thai. These two pairs should be similarly commented on, either explicitly or implicitly.
Date/Time: Wed Jun 14 12:56:33 CDT 2017
Name: Eduardo Marin Silva
Report Type: Public Review Issue
Opt Subject: Bitcoin informative note
It wouldn't hurt to add to the codechart an informative note for the bitcoin sign saying "cryptocurrency" since all other signs specify their use.
Date/Time: Mon May 8 23:30:27 CDT 2017
Name: Shriramana Sharma
Report Type: Other Question, Problem, or Feedback
Opt Subject: Dotted box for Brahmi/Kannada fricative characters
The Sharada chart currently shows 111C2 and 111C3 with a dotted box to indicate requirement of special rendering. The same should be applied to Kannada 0CF1 and 0CF2 and also to Brahmi 11003 and 11004. This was already requested in L2/14-066 and this is just a "gentle" reminder three years later! ☺️ The same should be applied to the TUS 9.0 description of these characters in the Kannada chapter 12.8 p 498 (534 of PDF). Note that the change does not conflict with L2/17-098 p 6 which asks for a different correction here. The corresponding Brahmi chapter 14.1 p 553 (589 of PDF) under Vowel Modifiers doesn't give written representations so this is not an issue there.
Date/Time: Thu Jun 29 21:43:08 CDT 2017
Name: Henry Chan
Report Type: Other Question, Problem, or Feedback
Opt Subject: Clarification on multi-column CJK Charts
The accuracy of the representative glyphs for representing the regional preferences in the CJK multi-column Code Charts has been called into question multiple times. Even though it is specified in Unicode Standard Chapter 24: > Each character in these code charts is shown with a representative glyph. A > representative glyph is not a prescriptive form of the character, but rather > one that enables recognition of the intended character to a knowledgeable > user and facilitates lookup of the character in the code charts. In many > cases, there are more or less well-established alternative glyphic > representations for the same character. > Designers of high-quality fonts will do their own research into the > preferred glyphic appearance of Unicode characters. In addition, many scripts > require context-dependent glyph shaping, glyph positioning, or ligatures, none > of which is shown in the code charts. The Unicode Standard contains many > characters that are used in writing minority languages or that are historical > characters, often used primarily in manuscripts or inscriptions. Where there > is no strong tradition of printed materials, the typography of a character may > not be settled. Because of these factors, the glyph image chosen as the > representative glyph in these code charts should not be considered a > definitive guide to best practice for typographical design. It may be better to do the following in the East Asian Chapter: (1) repeat that "A representative glyph is not a prescriptive form of the character, but rather one that enables recognition of the intended character to a knowledgeable user and facilitates lookup of the character in the code charts." (2) specify that "The representative glyphs for each locale/column in the CJK Unified Ideographs are not necessarily the preferred or normative glyphs per each region", (3) iterate that "Other more or less well-established alternative glyphic representations may exist". This will more accurately reflect Korea's and Hong Kong's current situation where the fonts supplied are not necessarily in conformance to national standards, and that users' conventions are not necessarily well defined or in alliance with the national standards. It will also cover the case where China's two glyphs in Extension B don't match the Tongyong Guifan Hanzi Biao (TGH), which China NB tends not to correct, as well as the unknown normality for the 8 or 9 variant characters in TGH that have a unifiable-but- markedly-different glyph in GE sources. It will also be consistent with Taiwan's practice that the characters in T3 and above are not normalized, and also agrees with the fact that only supporting the base character at its current Unicode J-column sources is definitely not enough to cater for the modern preferences for Japanese users. (Submitted to the reporting form on the advice of Ken Lunde)
Date/Time: Tue Jul 4 16:08:16 CDT 2017
Name: Johannes Athmer
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Updated German ortography concerning ß/ẞ
In the Unicode FAQ (and case mapping information), it is stated that ß (U+00DF LATIN SMALL LETTER SHARP S) should not be encoded to the upper case ẞ (U+1E9E LATIN CAPITAL LETTER SHARP S) by default. It should be noted that the official German ortography has been changed[1] to include U+1E9E as the capital version of U+00DF. I would suggest to moving from mapping lower case "ß" to upper case "SS" by default and introducing a mapping of lower case "ß" to upper case "ẞ" instead. Not only is this now the official spelling, but it is also lossless. It's impossible (without knowing the vocabulary - and sometimes the context!) to transform upper case "SS" to lower case "ß" or lower case "ss". [1] http://www.rechtschreibrat.com/DOX/rfdr_PM_2017-06-29_Aktualisierung_Regelwerk.pdf
Date/Time: Sat Jul 15 15:55:53 CDT 2017
Report Type: Error Report
Opt Subject: BidiCharacterTest.txt
Hello, I'm trying to pass all the tests in BidiCharacterTest.txt , and I'm having problem understanding a few of the tests that to me appear to contradict the specification. The problematic lines in BidiCharacterTest-10.0.0.txt are the tests on lines 262, 263, and 264. Let's consider test from line 262: Dir: RTL Input: a ( b <RLE> c <pdf> ) _ 1 Level: 2 2 2 x 4 x 1 1 2 The problem I'm having is that the first opening bracket is assigned level 2 and the closing bracket level 1. This seems to contradict the three rules N0.b, N0.c.1, and N0.c.2 in the specification that all describe overriding the type of both brackets with either the matching or the opposite direction. The only case we can possibly get different levels (correct me if I'm wrong!) is if rule N0.d is applied and the brackets retain their neutral status until it is resolved in subsequent rules. I would very much appreciate if you would either acknowledge a bug or correct a misunderstanding on my part. Thank you in advance! Dov
Date/Time: Fri Jul 21 04:03:11 CDT 2017
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: Code Charts Are Lacking Hints About General Category
One of the main impeachments that is compromising Unicode education (beside the already discussed lack of a descriptor property—or the hijacking of what should have been the descriptor, as an alphanumeric identifier for the convenience of those users of the Standard who hate hex digits—and the insufficient truthfulness of many parts of the Standard) seems to be the lack of General_Category and Bidi_Mirrored property value hints in the Code Charts. Although the disclaimer clearly states the necessity of looking up the Unicode Standard and TRs, Unicode *education* isnʼt meant to compile all the information needed “for a successful implementation” to work out content. The Gc and bidi mirroring happen to be indispensable for an average understanding of characters. Believe it or not: even Unicode experts failed to know about bidi-mirroring of angle quotation marks. Such shortcomings wouldnʼt happen if the Gc and BM were added in the Code Charts. That can be done without overloading the layout, by simply appending an 'M' to the two-letter code of the Gc if applicable, and then add this code after the code point. Thank you in advance. Regards, Marcel
Date/Time: Fri Jul 21 04:16:12 CDT 2017
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: Code Charts Are Lacking Hints About General Category
When talking about the drawbacks that hinder Unicode education, Iʼd recall also the already reported singular instead of a plural in the title of TUS: Core Specification[s]. See: https://forum.wordreference.com/threads/specifications-or-specification.2611680/ This adds to a number of other flaws as already discussed elsewhere, to lower the enthusiasm of scholars. Not correcting it is likely overstating the stability policies. The only argument I can see for keeping the singular is that turning it to plural is admitting a mistake in such a prominent place where many readsers will notice it when opening the new version for the first time. If there is another one, please let me know. Regards, Marcel