The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of August 3, 2017, since the previous cumulative document was issued prior to UTC #152 (August 2017). Some items in the Table of Contents do not have feedback here.
The links below go directly to open PRIs and to feedback documents for them, as of October 9, 2017.
Issue Name Feedback Link 359 Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm (feedback) 358 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback to date 357 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback to date 356 Proposed Update UTS #51, Unicode Emoji (feedback) 355 Proposed Update UAX #29 Unicode Text Segmentation (feedback)
The links below go to locations in this document for feedback.
Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports
Note: The section of Feedback on Encoding Proposals this time includes:
L2/11-359
L2/12-309
L2/15-338
L2/17-190
L2/17-238
L2/17-255
L2/17-303
L2/17-339
L2/17-366
L2/17-372
L2/17-380
L2/17-382
Date/Time: Sat Aug 12 16:33:45 CDT 2017
Name: Eduardo Marin Silva
Report Type: Error Report
Opt Subject: On tally marks in vertical text
I make this form since I cannot wait for the next UTC. In this document: http://www.unicode.org/L2/L2017/17255-script-ad-hoc.pdf the ad hoc comittee discussed my proposal concerning tally marks. I argued for three thinghs, the addition of named character sequences to unambiguosly refer to two, three, and four, changing the name of the tally marks to refer to their shape, instead of leaving them as the apparently sole system of tally marks and making the tally marks rotate in vertical text. • The first request is only for convinience, if they fell the hassle of getting the sequences enconded are not worth the benefits that's fine. • The name change I believe while not necessary per se, would take out ambiguity of what tally mark one is refering to, particulary since there are still unencoded "box tally marks" used in South America, and users there may fell that the names were not assigned fairly. • As for the tally marks in vertical text, I do not need to present evidence of their usage there, because one just has to consider what happens when one attempts to enter them in such an enviroment. While number five would occupy a cell as expected, the number four would occupy four cells (four times as tall), meanwhile the rotated glyphs would only occupy a narrow band. Maybe some higher order software would be able to force the four characters to occupy the same cell, but the fact remains that the fallback will remain unnaceptable. Asking for evidence is like asking for instances of VERTICAL LINE in vertical contexts, even though it is obvious that the rotated glyph is more desirable.
Date/Time: Wed Aug 9 09:45:21 CDT 2017
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Anglicana w and paleographic variation
L2/17-238
L2/17-238 amply demonstrates that two forms of w were used in the same manuscript. It doesn’t show that the distinction was meaningful or anything more than scribal whim; I conclude the character is meant for use by paleographers, for whom scribal whims are significant. This is not the only character for which a single manuscript has multiple interchangeable glyphs. For example, “Le roman de la rose” (MS. Douce 195) uses two versions of d. I suggest that the UTC create a policy for paleographic variants in general before encoding this single variant. For example, are manuscripts enough evidence, or should there be evidence from modern works, to show that paleographers do distinguish the glyphs in plain text?
Date/Time: Thu Aug 17 07:17:57 CDT 2017
Name: Christoph Päper
Report Type: Public Review Issue
Opt Subject: Categories of Emoji Draft Candidates (WG2 N4904 / L2/17-366)
X+1F9A0 Microbe is categorized as Objects / tool. It should be in the Animals & Nature category, perhaps in a new subcategory. X+1F9EC DNA Double Helix is categorized as Objects / tool. It should be either in the Animals & Nature category or in the Symbols category, perhaps in a new subcategory. X+1F9F9 Broom is categorized as Objects / other-object. It belongs to the tool subcategory. Several other candidates are lumped in together within Objects / other-object. They should get at least one new subcategory. This would be either 'household' or 'hygiene' (X+1F9F4 Squeeze Bottle / Lotion, X+1F9FB Toilet Paper, X+1F9FC Soap, X+1F9FD Sponge), although related emojis are found in Travel & Places / hotel, and Activities / craft (X+1F9F5 Thread, X+1F9F6 Yarn, X+1F9F7 Safety Pin). X+1F9F8 Teddy Bear may fit better within Activities / game, since there is no toy subcategory. Furthermore, I'd like to suggest reconsidering the subcategories of some existing emojis within the Travel & Places category: U+1F3D9 City Scape should move to similar ones in place-other, or into a new subcategory place-scenery. U+1F3B0 Slot Machine is better found in Activities / game. U+1F3A8 Artist Palette should be moved to Objects / tool as it does not indicate a place. Or move it to a new subcategory Activities / art together with (at least) U+1F3AD Perorming Arts and U+1F5BC Framed Picture. U+2668 Hot Springs better fits with other signs in Symbols / transport-sign. U+1F6D1 Stop Sign also belongs in Symbols / transport-sign. If the Unicode Consortium had the resources, you should conduct several card sorting sessions to determine categories and orders that actually feel natural to people.
Date/Time: Tue Oct 3 20:01:27 CDT 2017
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on
L2/17-339
L2/17-339 “Revised chart of Naxi Dongba characters” has some problems. I found these by skimming; I probably missed many. The discrepancies between character names and phonetic transcriptions should be solved by automatically deriving the former from the latter. On page 17, the phonetic transcription of character 75 is “bjə³¹”. It is one of only two transcriptions to include “j”. It should probably be “biə³¹”. On page 22, the phonetic transcription of character 103 is “bv̩³³cjə⁵⁵”. It is the only transcription to include “cj”. It should probably be “bv̩³³ʨə⁵⁵”. On pages 32 and 85, the phonetic transcriptions of characters 152 and 417 include “ɣ” but their names are written as if those syllables had no onsets. On page 140, the names and phonetic transcriptions of characters 691 and 692 are mixed up. On page 143, the name of character 709 does not match its phonetic transcription: “TV” vs. “tʰv̩³¹”. On page 157, the name of character 776 has an extra “DONGBA CHARACTER”. On page 187, the name of character 927 does not match its phonetic transcription: “SEEL” vs. “sɿ³³”. On page 206, the gloss of character 1020 should be “flag”, not “flab”. On page 211, the references of characters 1045 and 1046 are swapped. On page 220, the names and phonetic transcriptions of characters 1092 and 1093 are mixed up. On page 229 ff., the names of the characters are missing a space in “DONGBACHARACTER”. No character for ‘eight’ is listed. In fact, ‘eight’ is /ho⁵⁵/, which is presumably why the character currently named DONGBA CHARACTER HOL includes eight short vertical lines. It is odd that that number should be missing from the repertoire when other small numbers are included.
Date/Time: Sat Oct 7 15:32:20 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: n4904 feedback: Lao glyphs and names (L2/17-366)
The glyphs for the new letters to be added, are not harmonious with the rest of the glyphs in the codechart, overall the stroke width is thinner in the new characters. It is also not clear why the names of the letters need the adjective "Pali" and "Sanskrit" in them, since none of the names would clash if those words were dropped (in the case they did, it should be limited to the letters where the names actually clash). At the very least the name for the Virama should be changed to LAO SIGN VIRAMA, because it is meant to create new sounds regardless of language and there has never a precednet to name a virama after a language.
Date/Time: Sat Oct 7 15:47:53 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: n4904 feedback: VEDIC SIGN JIHVAMULIYA glyph (L2/17-366)
Since the glyph for this character has changed it is now confusable with its Kannada counterpart (dotted box and all), to avoid complications it should be noted when it is preferable to use one instead of another. Given what we have found out about the use of VEDIC SIGN ARDHAVISARGA and its rotated version in Nandinagari, it is clear that they were meant to be spacing characters, I'm not sure if a change of properties would be warranted or even possible, but it is something to consider.
Date/Time: Sat Oct 7 16:46:13 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: n4904 feedback chess notation symbol names and Group mark issues (L2/17-366)
UNITED SYMBOL, SEPARATED SYMBOL, DOUBLED SYMBOL, PASSED SYMBOL should be renamed UNITED PAWNS SYMBOL, SEPARATED PAWNS SYMBOL, DOUBLE PAWNS SYMBOL and PASSED PAWN SYMBOL, not only does it make the intended use clearer, the somewhat redundant informative aliases can be dropped. Having the more general names, runs the risk of confusion. Also the Group mark has a cross refernce to the DOUBLE DAGGER, when it should be referencing the TRIPLE DAGGER (2E4B). And the character THERMODYNAMIC 29E7 in the codecharts lacks the informative alias: Record Mark
Date/Time: Sat Oct 7 16:55:56 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: n4904 feedback: DOUBLE OBLIQUE HYPHEN WITH FALLING DOTS (L2/17-366)
As proposed it is not clear when that punctuation mark should be used (does it share a function with the double oblique hyphen?), it at least should mention its Cornish origins.
Date/Time: Sat Oct 7 17:21:24 CDT 2017
Name: Eduardo Marín Silva
Report Type: Error Report
Opt Subject: n4904 feedback: Pinyin uppercase letters (L2/17-366)
The chart does not mention their lowercase counterparts (they should also be references to them in the charts of the lowercase letters) this should be true for any cased letter pair were the corresponding letters are not adjacent to each other.
Date/Time: Sat Oct 7 17:25:06 CDT 2017
Name: Eduardo Marín Silva
Report Type: Error Report
Opt Subject: n4904 feedback: NEWA LETTER VEDIC ANUSVARA glyph (L2/17-366)
The glyph looks nothing like the original version in the proposal and it lacks harmony with the rest of the Newa letters.
Date/Time: Sun Oct 8 18:56:38 CDT 2017
Name: Eduardo Marín Silva
Report Type: Error Report
Opt Subject: n4904 feedback: New emoji part 1 (L2/17-366)
I start by saying that I already critized certain emoji proposals in the document: https://www.unicode.org/L2/L2017/17303-emoji-notes.pdf So here I will only elaborate on the ones I didn't touch in that document. The bagel is sliced to visually distiguish it from a donut, but what I don't understand is why it was approved, considering that it's just a piece of bread shaped differently. But assuming there is a good reason for its inclusion, I suggest keeping the same glyph but just naming it BAGEL, because font developers have access to color, it is not likely they would necessarily represent it sliced, so having the more general name gives them that option. The Mammoth is not in my opinion sufficiently distinct from ELEPHANT to merit separate enconding, but assuming it is, I take issue with the fact that it is proposed to be a generic indicator of great size, when the WHALE or the SAUROPOD characters are much more suited for that purpose. A skunk is much more suited in my opinion, than a badger to be encoded, due to its disctinctive connotation of bad odor and the phrase "drunk as a skunk", they have also appeared as famous character like the Looney Tunes and Bambi. Badgers only became famous due to a viral video featuring a song repeatetly saying the name, and certain memes saying "honey badger don't care". So a badger is a lot more transient. Enconding both a badger and a skunk runs the risk of confusion due to their similar apperance.
Date/Time: Sun Oct 8 20:06:02 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: n4904 feedback: New emoji part 2 (L2/17-366)
It is a terrible idea to encode the character BILLIARD GAMES as it is, a much more constructive approach is to do the same as what was done with FLYING SAUCER. Instead of letting two separate but semantically related glyphs to be used (in this the ALIEN character, both for a face and the saucer) a second character to denote the saucer was encoded. I agree with the Irish national body, original suggestion of returning the original glyph to the character BILLIARDS and encode a separate 8 BALL character. This makes sures that the ones who actually did the correct thing by using the original glyph, instead of conflating it, are rewarded. Also the name of the original character will not be misleading. The only downside is that it may open up the doors for a proposal to up to 21 different balls (including the white ball but excluding the eight ball), along with a different set of 7 colored balls without number for snooker, along with a cue stick, pool table and chalk, but I don't see that as problematic as long as all of those symbols are kept in a separate block. The firecracker emoji looks like dynamite, and while separate enconding of dynamite is debatable, if one wants to represent firecrakers properly, one needs to represent them in a line, so I propose changing the glyph to represent the character LINE OF FIRECRACKERS. There is no reason why the JIGSAW PUZZLE PIECE shouldn't be a solid color (either all black or all white in this case). The petri dish contains visble microbes, wich makes it seem redundant with the microbe emoji, instead it should look like actual cultures in the macroscale (a ink blot like field with dots on it should be enough). The BASKET is way to broad, if one want to represent laundry, it should say BASKET WITH CLOTHES, but if one just wants to represent a basket, then the glyph should not include the heap that is present in the current one. Personally I would prefer one character for HAND BASKET and another for PILE OF CLOTHES, this allow for greater expresions to be made like gathering, shopping in the first case and overall untidiness for the second, both emoji could then be used in succesion to indicate laundry. Also the coin should have accepted into the repertoire due to its connotation with chance and small amounts of money, there are several machines that only accept coins as input, and the so called piggy banks are designed for coins.
Date/Time: Wed Oct 11 18:58:49 CDT 2017
Contact: corbett.dav@husky.neu.edu
Name: David Corbett
Report Type: Feedback on an Encoding Proposal (L2/17-369)
Opt Subject: Indic_Syllabic_Category of Newa jihvamuliya and upadhmaniya
L2/17-369 proposes that the Newa jihvamuliya and upadhmaniya have Indic_Syllabic_Category=Consonant_Prefixed. Based on the manuscript sample in figure 1, a better category would be Consonant_With_Stacker. Consonant_Prefixed is for superjoined consonants, whereas Consonant_With_Stacker is for full-sized consonants to which subsequent consonants are subjoined.
Date/Time: Thu Oct 12 22:09:02 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: On the Khitan block and Miao sign Nukta
It is my opinion that the repertoire for the Khitan small script is pretty exhaustive, and therefore there is not much need for extra space. The twelve unallocated code points should suffice for future discoveries, if the two extra columns are allocated, then it is likely it will be unused codespace that could have been used for other scripts. The new range should be 18B00-18CDF. The MIAO SIGN NUKTA in my opinion would be better allocated in 16F8E, since not only is it closer to other combining marks but it leaves one more space for a future letter (5 instead of 6), yet it still leaves 6 codepoints for other vowel signs so it is almost like a balance.
Date/Time: Mon Oct 16 13:59:51 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Note on NEPTUNE FORM TWO
A reference to the character U+2646 NEPTUNE, is missing in the chart for NEPTUNE FORM TWO.
Date/Time: Wed Oct 18 20:40:31 CDT 2017
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Kashmiri digits forms should be mentioned in core spec
The core spec, in the text surrounding "Table 9-2. Glyph Variation in Eastern Arabic-Indic Digits" and the table itself, should mention that Kashmiri also uses the Eastern digits, and the digit shapes are identical to the Urdu forms. This is important for font support for Kashmiri, which is one of the 22 scheduled languages of India. It would help font designers find about the need to add localized forms for Kashmiri in their fonts.
Date/Time: Thu Oct 19 11:19:28 CDT 2017
Name: David Corbett
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on the Basque flag emoji
L2/17-382 proposes either U+1F1EA U+1F1F0 or U+1F3F3 U+FE0F U+200D U+2733 for the Basque flag. The first (EK) is not a Unicode region subtag. The second is a combination of a flag and a dingbat, which sets a bad precedent, because not all flags happen to look like existing emoji. I suggest 🏴espv✦ instead, which is already defined though not RGI.
Date/Time: Fri Oct 20 07:15:23 CDT 2017
Name: Christoph Päper
Report Type: Other Question, Problem, or Feedback
Opt Subject: L2/17-380 ESC report 2017Q3
Separate topics for discussion ============================== Other documents --------------- > > * L2/17-296 — Comments on Recently Approved Emoji Candidate Names > > - Probably too late for name change, ESC consider for future CLDR names/keywords It is definitely not too late to discuss and make changes to character names. Unicode 11 is not even in beta yet. After the recent quarrel with WG2, the UTC should make clear that all of its members and committees understand that, so this can be resolved to guarantee fruitful cooperation once again. > > * L2/17-303 — Notes on emoji proposals > > - ESC considered human-form vs smiley for superhero/villain. Did not think the 12-18x cost for humanform emoji was worth it. Emoticons are faces, mostly to represent facial emotions as well as some actions (e.g. sneezing) and some features (e.g. glasses). There have been cases of image sets, e.g. in early-2000s forum software or desktop instant messengers, that used the classic 1970s yellow Smiley face or a non- copyrighted variant thereof to represent all sorts of feelings, actions, stereotypes, animals etc., see the defunct Smileyworld.com or now <http://www.smiley.com/emoticons/dictionary>. If those would be remade to use Unicode, they would certainly show a Superhero and a Supervillain character with a face-centric design, but that is a design choice, not a character choice. These emojis must be person/human-form. A Masked Face emoji would be slightly different. > > - Suggestion of POO + ZWJ + <face> was reasonable, but probably too late in the process. This draft character should be postponed until there is a binding and lasting decision how to deal with additional emotional faces. The cat face emojis were included for legacy reasons, but this one would open up a can of worms for new requests. Please take your time to devise a more generic solution! This could be in the form of new combining emojis for facial properties (like Smiling Eyes, Open Mouth etc.) that could form sequences (without ZWJ) with almost any emoji character that has FACE in its Unicode designation (and perhaps some more). > > - SMILING FACE WITH SMILING EYES AND THREE HEARTS got a higher priority due to request data. > > - Other faces weren’t considered distinct enough, or high enough priority. This very character had been dismissed in 2016. A ZWJ sequence, e.g. with U+1F49E 💞, was deemed more appropriate, but actually neither recommended (RGI) nor even documented. > > * L2/17-376 — “Top of Head” Emoji feedback > > - These characters are already recommended as components. An important point is that hair color (only red and white here) is quite a different thing than hair style (curls) or its entire absence, yet the proposed solution treats them alike. You could add Wig emojis and use them for ZWJ sequences, but they would still need to be available *at least* in black, brown, blond, ginger, gray and white variants via some method, because right now the hair color in person emojis depends on the skin color which is not a deterministic relationship in reality. Changing one's hair color through dye is also a lot simpler and more common than making changes to one's skin color. Tattoos are permanent, but hardly ever applied as solid fills. Forward RGI/Emojification requests to UTC ========================================= Emojification ------------- > > * L2/17-343 — Infinity Emoji Submission > > - Note that Samsung has emoji presentation for this. This is not true. Samsung has emoji presentation for U+267E Permanent Paper Sign which includes an infinity symbol, because Samsung systematically has emoji representations for *all* characters in the Miscellaneous Symbols block U+26xy. > > L2/17-387— HEADSTONE (U+26FC HEADSTONE GRAVEYARD SYMBOL) > > - Note that Samsung has emoji presentation for this. This is true, but since this is a standardized map symbol showing it as a gravestone is a misguided decision. If this was acceptable, e.g. U+1F3E6 Bank could have been unified with U+26FB Japanese Bank Symbol and U+26EF Map Symbol for Lighthouse could be rendered as a Lighthouse emoji. Tag Sequences ------------- It certainly makes no sense to make flag emojis for all 50ish states of the US become RGI without proven demand whereas only select ones from other countries. Then again, the whole concept is flawed as fruitlessly discussed in 2016. ZWJ Sequences ------------- > > * MAN/WOMAN + ZWJ + <new hair styles> My counter-proposal would be to treat U+1F471 Person with Blond Hair and perhaps U+1F487 Person Getting a Hair Cut as well as U+1F9D4 and an altered X+1F9B1 Person with Curly Hair as the only emojis whose primary feature is hair and therefore would be subject to generic color modifiers. These could be done with existing heraldic tincture hatching pattern U+25A0,3..9 characters, i.e. they would only need the emoji property to become Swatches and no new codepoints (except that arguably some are missing, see L2/11-094, L2/16-318 and <https://github.com/Crissov/unicode- proposals/issues/329>). > > * L2/17-389 Mike Drop / Mic Drop > > - ESC was neutral on this. Could be too trendy. It probably fails the Fad criterion indeed. You should consider a hand gesture that can clearly be used for dropping something, though. > > * Others from L2/17-287 Section: ZWJ Sequences > > - Recommended against > > + Heart with knife > > * Unnecessarily violent; can be conveyed with existing sequence I would have expected a debate whether it should use a dagger or a kitchen knife, but rejecting something that thousands if not millions of peaceful people have tattooed on their bodies as "unnecessarily violent" is just inappropriate, especially when it indeed represents sadness instead. ISO Character Requests ====================== The ESC definitely needs to publish its ranked list of possible future animal emojis ("our omnibus collection of animals") rather sooner than later and it needs to say for which ones they already received proposals and the reason why they have not progressed (yet). That being said, the non-extinct animal characters proposed by WG2 seem like an arbitrary selection that needs extension. ESC should not wait for individual proposals to come in, but instead develop a list of animals that are culturally relevant and distinguished throughout the world. These should then be discussed by UTC and ISO/IEC and encoded all at once. Then be done with it except for single additions once in a while when new evidence of relevance has surfaced, just as with characters in any non-pictographic script in Unicode. > > 3. 1F9A4 SQUIRREL I understand and partially share the reluctance to encode this one, because Chipmunk is so similar. Please also consider the possibility of Squirrel Face instead. > > 7. 1F97B TROLL > > b. Was one of 64 emoji in proposal “64 Complementary Emoji”). > > f. Might set precedence for other “emotes” used by gamers (twitch) and rage faces; style would be out of place for emoji. The document titled "64 Complementary Emoji" has apparently never been published to the L2 registry. Nobody is seriously requesting the addition of the copyrighted graphic known as the "Troll Face Meme" or any other "Rage Face".
Date/Time: Mon Oct 23 07:52:59 CDT 2017
Name: Christoph Päper
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/17-381 Scuba emoji
The proposal for a Scuba Emoji mentions no reason for compatibility encoding, but there actually is. While the original unified Japanese emojis were a superset of (much less successful) WAP Pictograms in general, they were lacking a substitute for its `/sport/scuba` entry. There arguably may be other omissions (`animal/beetle`, `emotion/shakenHeart`, `map/zoo`, `music/rest`), but this one has no substitute whatsoever and should definitely have a Unicode character assigned to it. [Diving emoji]: https://github.com/Crissov/unicode-proposals/issues/179 [WAP Pictograms]: https://github.com/Crissov/unicode-proposals/issues/260 [WAP Pictogram Specification]: http://www.openmobilealliance.org/tech/affiliates/wap/wap-213-wapinterpic-20010406-a.pdf
Date/Time: Mon Oct 23 22:20:28 CDT 2017
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Names of two astrological symbols
The name PROSERPINA could be confused to mean the comet called that way, even though the proposal says they are not related. A better name may be ASTROLOGICAL PROSPERPINA. Or at the very least an annotation indicating its true nature. There has never been a need to encode any other astronomical symbol with the "FIRST FORM" prefix.I suggest just calling 2BF0 ERIS, that makes it obvious that that symbol takes precedence over the second one, and they are not both intechangable (in the proposal they weren't, hence the need of separate encoding).
Date/Time: Sat Jul 29 21:52:56 CDT 2017
Name: Timothy Gu
Report Type: Error Report
Opt Subject: Issues with UTS #46's conformance test file
To whoever it may concern, While developing a product conforming to "UTS #46: Unicode IDNA Compatibility Processing, Version 10.0.0" [UTS46], we noticed a few issues with the provided conformance testing file (IdnaTest.txt). These issues are preventing us from implementing UTS #46 in tr46.js [TR46JS-ISSUE]. The IdnaTest.txt file is formatted as a list of semicolon-separated values. The meanings of the specific columns are given in UTS #46 Section 8.1, an excerpt of which is hereby reproduced [UTS46]: > > No Field Description > > ... > > 3 toUnicode The result of applying toUnicode to the source, using "nontransitional". > > A blank value means the same as the source value; a value in [...] is a set of error codes. > > 4 toASCII The result of applying toASCII to the source, using the specified type: T, N, or B. > > A blank value means the same as the toUnicode value; a value in [...] is a set of error codes. > > > > ... > > > > An error in toUnicode or toASCII is indicated by an error list of the form [...]. In such a case, the > > contents of that list are error codes based on the step numbers in UTS46 and IDNA2008: > > > > ... > > An for Section 4.2 ToASCII, step n > > ... Given that "An" applies only to the ToASCII algorithm, not the ToUnicode algorithm, it seems appropriate for field "toUnicode" in IdnaTest.txt to never have an error code of form An. Yet, in the published IdnaTest.txt file corresponding to version 10.0.0 [IDNA-TEST], there exist 305 entries in IdnaTest.txt where an "An" error code appears under "toUnicode". In particular, there exist 36 entries with _only_ an "An" error code under "toUnicode" -- which, in other words, means that the only justification for erroring on those entries from ToUnicode is not actually in ToUnicode. This is particularly troubling, since while the Standard allows for ADDITIONAL error cases than ones already specified in IdnaTest.txt, a product conforming to UTS #46 must produce an error on ALL error cases in IdnaTest.txt, per lines 68-72 of IdnaTest.txt, again reproduced below: > > ... Thus to then verify conformance for the toASCII and toUnicode columns: > > > > - If the file indicates an error, the implementation must also have an error. > > - If the file does not indicate an error, then the implementation must either have an error, or must have a matching result.\ A close examination of the 36 entries mentioned above reveals that: - 9 of the 36 entries have only "[A3]" error code under ToUnicode, which corresponds to the Punycode-encoding step in ToASCII. The source domains all have one label with invalid Punycode-encoding though, so they would in fact have already recorded an error in no. 4 of Processing Steps, which is called upon by ToUnicode as well. In other words, these entries merely have a faulty error code; ToUnicode would still record an error for these entries, just one at a different step than advertised. Some samples from these 9 entries are: Line 313: B; xn--0.pt; [A3]; [A3] Line 315: B; xn--a-Ä.pt; [A3]; [A3] Line 316: B; xn--a-A\u0308.pt; [A3]; [A3] - The other 27 entries have only a "[A4_2]" error code under ToUnicode, which corresponds to the DNS length verification step under ToASCII. Some of them are: Line 201: B; 。; [A4_2]; [A4_2] Line 202: B; .; [A4_2]; [A4_2] Line 434: B; a..c; [A4_2]; [A4_2] Line 439: B; ä.\u00AD.c; [A4_2]; [A4_2] While these domain names are all rather unlikely to be allowed by real-world UTS #46 implementations, most (if not all) of them are still strictly allowed by ToUnicode as defined in UTS #46. Take line 201, for example. Step 1 of ToUnicode call into the Processing Steps, whose step 1 will map '。' to '.', and which will then pass through the rest of Processing Steps without recording an error. Step 2 of ToUnicode will then produce a "converted Unicode string" of '.', and signal there was no error. The 27 entries in IdnaTest.txt with [A4_2] are the real worrying ones, since they seem to go against the algorithms defined in UTS #46, and prevent us from creating a strict implementation of UTS #46 without passing its own conformance tests. To resolve these issues, I would like to see the following: - A clarification whether the aforementioned 27 entries should record an error in ToUnicode. - Corresponding changes to IdnaTest.txt or UTS #46 that accompany that clarification. - There be no entries in IdnaTest.txt with a ToUnicode error code that point to steps in ToASCII. Sincerely, Timothy Gu [UTS46]: http://www.unicode.org/reports/tr46/tr46-19.html [IDNA-TEST]: http://www.unicode.org/Public/idna/10.0.0/IdnaTest.txt [TR46JS-ISSUE]: https://github.com/Sebmaster/tr46.js/pull/13
Date/Time: Thu Aug 10 10:31:18 CDT 2017
Name: Ken Lunde
Report Type: Error Report
Opt Subject: UAX #45 datafile suggestion
While not an error, I propose that the UAX #45 datafile be annotated for each version, at least since becoming a UAX (Version 6.3.0), to indicate the version number and the number of characters that were added, such as via the following comment lines (the "<nnn entries omitted>" lines are meant to make what follows easier to understand): UTC-00001;E;U+2B88A;4.2;0082.031;;kCowles 4762 <950 entries omitted> UTC-00952;W;;167.5;1328.061;;UDR # Version 6.3.0 Additions: 245 UTC-00953;UNC-2013;;167.10;1318.281;⿰钅哥;UTCDoc L2-12/333 204 <243 entries omitted> UTC-01197;N;;1.6;0078.131;⿱合一;UTCDoc L2/13-009 19 # Version 7.0.0 Additions: 1 UTC-01198;N;;1.8;0078.171;⿳人伊一;UTCDoc L2/13-009 20 # Version 8.0.0 Additions: 3 UCI-01199;U;U+2F949;109.7;0809.030;⿰目夾;UTCDoc L2/14-260 UTC-01200;N;;85.10;0643.241;⿰氵恩;UTCDoc L2/15-109 UTC-01201;N;;112.5;0829.331;⿰⽯示;UTCDoc L2/15-114 # Version 9.0.0 Additions: 1,768 UTC-01202;H;;8.6;0089.191;⿱㐭水;UTCDoc L2/15-177 1 <1,766 entries omitted> UCI-02969;U;U+2BDA4;46.18;0322.391;⿰山⿱㞌⿰㞌㞌;TUS U+2BDA4 # Version 10.0.0 Additions: 6 UTC-02970;N;;157.9;1230.291;⿰足迷;UTCDoc L2/16‐066 1 UTC-02971;N;U+3779;42.8;0297.231;⿱少免;UTCDoc L2/16‐066 2 UTC-02972;N;U+2F8A4;61.9;0396.071;⿰忄柬;UTCDoc L2/16‐269 1 UTC-02973;N;;9.9;0112.071;⿰亻革;UTCDoc L2/16-239R 1 UTC-02974;N;;116.8;0866.551;⿱穴卑;UTCDoc L2/16-239R 2 UTC-02975;N;;142.11;1098.441;⿰虫崩;UTCDoc L2/16-385R 1 # EOF As UAX #45 grows, this will make it easier to determine when a particular character was added without going back to previous versions.
Date/Time: Thu Sep 28 20:41:52 CDT 2017
Name: Pedro Navarro
Report Type: Public Review Issue
Opt Subject: UTR #50 property value for U+2026
Hi, According to the UTR #50 data file, U+2026 HORIZONTAL ELLIPSIS is marked as 'R' which means it should be rotated when in vertical (the same happens with U+2025). I've tried several Japanese fonts (Noto, Heisei Maru Gothic) and they provide a vertical variant for it. Shouldn't the property value for those characters be, instead, 'Tr'? Or are we to consider that a particularity of the font? Thanks
Date/Time: Mon Jul 31 13:05:32 CDT 2017
Name: Marcel Schneider
Report Type: Other Question, Problem, or Feedback
Opt Subject: *11BD2 NANDINAGARI SIGN SIDDHAM
Hello, Iʼve just got aware that the Auspicious sign subheading in the new Nandinagari block is not as good as I thought it to be when reviewing for PRI #353. Unfortunately this is now closed, but Iʼll send you this anyway off PRI (leaving it to your convenience whether to add the below, or not). I think that this item could be processed as well at beta review, where it could be sent again. The reason why this seems important to me, is to make aware that the proposers didnʼt aim at doing differently, but can be meant as being aware of the existing usage in the Standard, and seeking consistency as far as feasible. Exhibiting some ability of being original does not make sense to me. (You may read between these lines that whey I change things for the French translation, itʼs really because I see a need of more accuracy and better consistency, for an overall greater usefulness and increased reputation of the Standard among its users. While opening thus a presumably positive reputation gap in favor of the French translation of the repertoire, I regularly try to propose the changes to your attention in case Unicode might wish to implement part or all of them for a final equalization.) Best regards, Marcel ________________________________________________________________ Iʼve just got aware that the Auspicious sign subheading for 11BD2 NANDINAGARI SIGN SIDDHAM could as well be Invocation sign. As the encoding proposal states: “The sign [SIDDHAM] is used as an invocation at the beginning of documents.” For consistency with other instances in the Standard such as U+A8FC, one could actually wish to replace “Auspicious sign” with “Invocation sign” in the future Nandinagari block.
Date/Time: Wed Aug 2 13:30:57 CDT 2017
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: Inappropriate remark in draft
In http://www.unicode.org/L2/L2017/17190-n4824-pdam1-3chart.pdf: 07FD ߽ NKO DANTAYALAN • used to abbreviate units of measure http://www.unicode.org/L2/L2015/15338-n4706-nko-additions.pdf: "For instace, it is used with ka as to abbreviate kúdɛ ‘kilometre’, with fa as to abbreviate fele ‘megametre’, with gba as to abbreviate gbàlàgbala ‘metre’, with sa as to abbreviate sidɔ ‘gram’, and with ta as to abbreviate tóngba ‘litre’. Examples of letters with DANTAYALAN connected to another letter are gbaw. ‘mm.’ and gbach. ‘cm.’. (See Figures 1, 2, 3.)" (In the paste from the PDF, some chars got botched.) While much of this text in n4706 is highly objectionable by itself, that is a separate issue. However, hinting (in Unicode charts) ["used to abbreviate units of measure"] that SI "short forms" for units are abbreviations is a MAJOR misunderstanding. The SI "short forms" are SYMBOLS (made from letters). They ARE mnemonic, but they are NOT(!!!) abbreviations. In this lies, among other things, natural language *independence*. While they must be language independent, it is understandable if one wants to transliterate the unit symbols to the "local script". That will still not make the SI unit symbols abbreviations, and the the transliteration scheme must respect the design of the SI unit symbols (prefixes, etc.), which the examples in n4706 appear not to do. Besides, all other scripts appear to manage just well without having a special underline (or similar) to mark unit denotations. This points to NKO DANTAYALAN being a bad idea to begin with.
Date/Time: Mon Aug 21 08:21:32 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Indic_Syllabic_Category of U+0A51
U+0A51 GURMUKHI SIGN UDAAT should have an Indic_Syllabic_Category. It is a tone mark, but it goes before any vowel sign. Its proposal document says “In many ways, Udaat should be treated as a subjoined consonant”, so I suggest Consonant_Subjoined.
Date/Time: Wed Sep 27 19:24:13 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Underspecified Ahom vowel signs
An Ahom consonant may take multiple vowel signs, all of which have ccc=0. The Unicode Standard does not say what order they should be encoded in. The proposal (L2/12-309R) recommends an order, but contradicts itself: on page 2, it says U+1172A AHOM VOWEL SIGN AM should precede U+11724 AHOM VOWEL SIGN U, but on page 3, it gives the opposite order. It is therefore unclear what the intended order is.
Date/Time: Mon Oct 2 07:41:44 CDT 2017
Name: Jonathan Kew
Report Type: Error Report
Opt Subject: Character missing from IndicSyllabicCategory.txt
It appears that U+0980 BENGALI ANJI is missing from IndicSyllabicCategory.txt, although as an expected base for U+0981 BENGALI SIGN CANDRABINDU, it seems like it really should appear. (The proposal for U+0980, http://unicode.org/L2/L2011/11359-bengali- (anji.pdf, confirms that <0980, 0981> is a valid cluster for the script.)
Date/Time: Mon Oct 2 09:04:57 CDT 2017
Name: Jonathan Kew
Report Type: Error Report
Opt Subject: Inconsistency in IndicSyllableCategory data
It seems logical that all the "Marks of nasalization" at U+A8F2 to A8F7 would have the same Indic category; AFAICS they all behave/render similarly. But currently the IndicSyllablicCategory.txt file classifies two of them as Bindu: A8F2..A8F3 ; Bindu # Lo [2] DEVANAGARI SIGN SPACING CANDRABINDU..DEVANAGARI SIGN CANDRABINDU VIRAMA but leaves the remainder uncategorized. Is there any good reason for this, or should they be harmonized?
Date/Time: Tue Oct 3 09:03:28 CDT 2017
Name: Jonathan Kew
Report Type: Error Report
Opt Subject: Indic Syllabic Category value Gemination_Mark should be subdivided
It looks to me like the Gemination_Mark category should probably be split. Currently, there are three characters with this property in IndicSyllabicCategory.txt: 0A71 ; Gemination_Mark # Mn GURMUKHI ADDAK 11237 ; Gemination_Mark # Mn KHOJKI SIGN SHADDA 11A98 ; Gemination_Mark # Mn SOYOMBO GEMINATION MARK However, AIUI the Gurmukhi mark is different from the other two, in that it indicates gemination of the following consonant, whereas the others indicate gemination of the preceding consonant. This suggests that GURMUKHI ADDAK would follow any matras etc on the preceding consonant and appear at the very end of a cluster, whereas the Khojki and Soyombo marks (and the Gujarati one U+0AFB that should be treated similarly) belongs immediately after the consonant it modifies, and precedes vowel matras. They're functionally quite different, and fit into the syllable structure in different places.
Date/Time: Sat Oct 7 13:20:17 CDT 2017
Name: David Corbett
Report Type: Error Report
Opt Subject: Obsolete alias for U+1039 MYANMAR SIGN VIRAMA
U+1039 MYANMAR SIGN VIRAMA has the names list alias “killer (when rendered visibly)”. It should not: it is never rendered visibly (except for fall-back rendering like a subscript plus sign, which doesn’t count). This is left over from the pre-5.1 version of Myanmar, before the visible killer was disunified as U+103A MYANMAR SIGN ASAT. Now that U+1039 is purely an abstract subjoiner without any glyph of its own (Indic_Syllabic_Category=Invisible_Stacker), that alias is obsolete and confusing.
Date/Time: Sun Oct 22 08:45:39 CDT 2017
Name: Charlotte Buff
Report Type: Error Report
Opt Subject: Typo in Proposed Character Name (L2/17-372)
The character U+10F45 SODGIAN PHONOGRAM SHIN in the proposed Sogdian block (see http://www.unicode.org/wg2/docs/n4872-DAM1chart.pdf, page 78) has a typo in its name. The script identifier is spelled SODGIAN with the G and D switched. It should be SOGDIAN PHONOGRAM SHIN.
Date/Time: Tue Oct 24 10:31:03 CDT 2017
Name: Brienna Carter
Report Type: Error Report
Opt Subject: Error in Name of Emoji
Dear Unicode, Today I was texting on my MacBook, composing a message to a friend that was coupled with an emoji to make my tone more explicit. Upon hovering over emojis to choose, I discovered that you can see what each one is defined as. This feature spurred me to leap into the endeavor of looking at the various labels of emojis. It was all fun and games looking at some oddly specific descriptions and finally figuring out what a "part alternation mark" is until I came across what I thought was a sneaker or–as you and pockets of mid-western United States call them–tennis shoes. Personally, I was offended by this finding. I am fully aware that the two terms "sneaker" and "tennis shoe" are colloquially synonyms, yet the term sneaker is much more generic and wholly acceptable. A tennis shoe technically points to a shoe used for the sport of tennis. By accepting this title as the principle label for this type of shoe, we run into many problems. The first being how we differentiate between an actual shoe used for tennis and the general term tennis shoe; it is simply awkward and unacceptable to call such a "tennis tennis shoe." Tennis shoe also reminds us of the history of sneakers: a shoe once worn principally for athletics. Today, tennis shoes/sneakers are worn on a daily basis for just everyday life. The term "sneaker" is more accepting of this modern-day fashion statement. As a whole, the United States (not the only users of your emojis but a large portion of English speakers who do) uses the word "sneaker" much more frequently. In fact, it is searched over "tennis shoe" by the majority on Google in each state except Mississippi. Therefore, the majority should rule and Unicode should conform. Until two emojis exist, the sole emoji of a white and gray shoe should be defined as a "sneaker." Sincerely, Brienna Carter
Date/Time: Wed Oct 11 14:56:49 CDT 2017
Name: Ken Lunde
Report Type: Other Question, Problem, or Feedback
Opt Subject: UTS #37 suggestion
In response to WG2 N4829 Section 12, "Request for Consideration of Relaxing IVD Rules: IRG M48.3 with reference to Part B of IRGN2219," I propose that the following text be appended to the fourth paragraph of UTS #37 Section 2, "Description," or as a separate paragraph that immediately follows the fourth paragraph: In an effort to reduce the number of encoded variants, the unification rules for unified ideographs, when applied to the IVD, have been expanded to include cases whereby 1) characters that have a different structure, but whose difference is not considered significant enough to encode them as separate unified ideographs, and for which strong evidence associating them as variants of encoded characters can be provided, such as ⿱汨皿 versus ⿰氵昷 (U+6E29 温) and ⿱戠火 versus ⿹戠火 (U+243B7 𤎷); and 2) characters with the same structure, but with different components at the second (or subsequent) level that may not be generally unifiable, and for which strong evidence associating them as variants of encoded characters can be provided, such as ⿺𠃊西 versus ⿺辶西 (U+8FFA 迺) and ⿰月㲋 versus ⿰月𣬉 (U+818D 膍). When considering the second case, the character should be rarely used and not in general circulation, and the registrant is expected to provide evidence that demonstrates 1) similarity of glyph shape; and 2) general acceptance as a variant.