This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Thu Jan 25 07:50:03 CST 2018
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: Misleading/wrong/missing specifications
U+2012 FIGURE DASH U+2012 FIGURE DASH should be specified as centered on lining digits, since it represents the minus sign in old-style typeset tables. This missing specification lead the designers of all fonts Iʼve checked, to make U+2012 a duplicate of U+2013, making it de facto useless. (Please compare with my previous feedback about U+2012.) TUS is wrong when stating that U+2012 has mixed semantics of U+002D, since it is NOT primarily a hyphen, NOR an en-dash, but a minus sign, and should be designed as such, i.e. centered on lining digits in fonts with lining (uppercase) digits, and centered on lowercase letters ONLY in fonts with lowercase (Elzeviran) digits. Consequently, fonts providing both lining and lowercase digits MUST provide two according glyphs for FIGURE DASH, and toggle between the two depending on that flag. All that should be specified in the Standard, and should have been so from the beginning on, as a guideline for inadvertant font designers. U+279D AND U+2B62 TRIANGLE-HEADED RIGHTWARDS ARROWS U+279D TRIANGLE-HEADED RIGHTWARDS ARROW and U+2B62 RIGHTWARDS TRIANGLE- HEADED ARROW have been made confusable due to misnaming of the former. It is good practice to start arrow names with the direction. Thus the set in the Miscellaneous Symbols and Arrows block has well-formed names, while the Dingbat arrows names are biased because in that range, almost all arrows are rightwards. To fix that name confusion, adding the cross-references is not enough. An informative alias should be added to U+279D, calling it THIN TRIANGLE-HEADED RIGHTWARDS ARROW, as opposed to the next: U+279E HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW, and according to the chart glyphs. And an annotation to U+2B62 (“confusable with 279D”) would also be helpful. U+279C HEAVY ROUND-TIPPED RIGHTWARDS ARROW The chart glyph of U+279C does not reflect the character identity, as it is not round-tipped, only round-barbed, like in U+27BA TEARDROP-BARBED RIGHTWARDS ARROW actually all strokes (barbs and stem) are ending in teardrops. However current fonts show this arrow actually round-tipped, giving it the intended (and consistent) design. Hence the chart glyph of U+279C could use an update. And not only that. It is really wrong, being not round-tipped. It should never have made it into the Code Charts.
Date/Time: Tue Mar 6 12:20:07 CST 2018
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: Comment line for 2E4A DOTTED SOLIDUS in NamesList-11.0.0d5.txt
2E4A DOTTED SOLIDUS = virgula suspensiva * indicates a medial disjunction less than solidus but more than punctus elevatus Unless I am mistaken, the comment line should instead read: * indicates a medial disjunction more than solidus but less than punctus elevatus Please see here: https://pennpaleography.files.wordpress.com/2013/06/parkes_pause-effect-1992_select-glossary-of-technical-terms-punctuation-symbols.pdf "[dotted_solidus] was used by Humanist writers of the fourteenth century to indicate disjunction greater than that indicated by [solidus] and less than that indicated by [punctus_elevatus]." https://www.unicode.org/L2/L2015/15327r-n4704-medieval-punct.pdf "Humanist writers of the 14th century made a distinction whereby [dotted_solidus] indicated a break greater than that indicated by [solidus] but less than that indicated by [punctus_elevatus] PUNCTUS ELEVATUS MARK." Marc L.
Date/Time: Sun Feb 4 15:25:37 CST 2018
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: General Category of U+11A07 and U+11A08 should be Mn
The characters U+11A07 ZANABAZAR SQUARE VOWEL SIGN AI and U+11A08 ZANABAZAR SQUARE VOWEL SIGN AU have mistakenly been assigned the general category of "Mc". Since these are top-right and top-left marks as opposed to pure left or pure right marks, they should get the general category of "Mn" instead. For example, compare with the general category of U+11A0A ZANABAZAR SQUARE VOWEL LENGTH MARK which is a bottom-right mark and is correctly classified as "Mn".
Date/Time: Thu Mar 8 02:36:01 CST 2018
Name: fantasai
Report Type: Error Report
Opt Subject: Apparent Sentence_Break miscategorizations
* Semicolons are all categorized under Other, whereas colons and commas are categorized under SContinue. It seems to make more sense that semicolons be categorized under Scontinue. * The Greek Question Mark is categorized as Other rather than with the other question marks in STerm. * Old Nubian punctuation (COPTIC OLD NUBIAN) seems to have not been categorized at all, and is filed under Other. * Vertical Forms (PRESENTATION FORM FOR VERTICAL) punctuation is also not categorized with its canonical equivalents, and should be. CSS is trying to rely on these categorizations, it would be helpful if they were rigorous or if we understood why they are idiosyncratic like this. See https://unicode.org/cldr/utility/list- unicodeset.jsp?a=%5B%3AGeneral_category%3DPo%3A%5D&g=Sentence_Break&i=
Date/Time: Thu Mar 8 13:10:08 CST 2018
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: Appropriate subhead for 166D in NamesList-11.0.0d5.txt?
166D (CANADIAN SYLLABICS CHI SIGN) is in General_Category = Other_Punctuation (Po) and qualifies for Terminal_Punctuation. I wonder why 166D isn't included at the appropriate heading ('Punctuation'), with 166E CANADIAN SYLLABICS FULL STOP. @ Symbol 166D CANADIAN SYLLABICS CHI SIGN * Algonquian * used as a symbol to denote Christ x (chi rho - 2627) @ Punctuation 166E CANADIAN SYLLABICS FULL STOP x (stenographic full stop - 2E3C) Why couldn't we have: @ Punctuation 166D CANADIAN SYLLABICS CHI SIGN * Algonquian * used as a symbol to denote Christ x (chi rho - 2627) 166E CANADIAN SYLLABICS FULL STOP x (stenographic full stop - 2E3C) I must add that in Index.txt 166D points to Canadian Syllabics PUNCTUATION: Line 758: Canadian Syllabics Punctuation 166D Line 4188: Punctuation, Canadian Syllabics 166D Line 5087: Syllabics Punctuation, Canadian 166D Marc
Date/Time: Sat Mar 10 11:23:18 CST 2018
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #372: Defective Glyphs in Unicode 11 Charts
There are a number of issues with code chart glyphs in Unicode 11. Several contextual Mongolian glyphs display as squared question marks: U+1829 MONGOLIAN LETTER ANG first form (initial) U+192C MONGOLIAN LETTER QA first form (final) U+1836 MONGOLIAN LETTER YA first form (final) U+1840 MONGOLIAN LETTER LHA first form (final) U+1843 MONGOLIAN LETTER TODO LONG VOWEL SIGN first form (initial) U+1844 MONGOLIAN LETTER TODO E first form (isolate) U+1844 MONGOLIAN LETTER TODO E first form (final) U+184A MONGOLIAN LETTER TODO ANG first form (initial) U+1855 MONGOLIAN LETTER TODO YA first form (final) U+185F MONGOLIAN LETTER SIBE IY first form (initial) U+1862 MONGOLIAN LETTER SIBE ANG first form (initial) U+1864 MONGOLIAN LETTER SIBE GA first form (final) U+1865 MONGOLIAN LETTER SIBE HA first form (final) U+1866 MONGOLIAN LETTER SIBE PA first form (final) U+1869 MONGOLIAN LETTER SIBE DA first form (final) U+186A MONGOLIAN LETTER SIBE JA first form (final) U+186B MONGOLIAN LETTER SIBE FA first form (final) U+186E MONGOLIAN LETTER SIBE TSA first form (final) U+186F MONGOLIAN LETTER SIBE ZA first form (final) U+1871 MONGOLIAN LETTER SIBE CHA first form (final) U+1872 MONGOLIAN LETTER SIBE ZHA first form (final) U+1876 MONGOLIAN LETTER MANCHU FA first form (final) U+1877 MONGOLIAN LETTER MANCHU ZHA first form (final) U+1887 MONGOLIAN LETTER ALI GALI A first form (initial) U+1887 MONGOLIAN LETTER ALI GALI A first form (medial) U+1888 MONGOLIAN LETTER ALI GALI I first form (initial) U+1888 MONGOLIAN LETTER ALI GALI I first form (medial) U+1889 MONGOLIAN LETTER ALI GALI KA first form (medial) U+1889 MONGOLIAN LETTER ALI GALI KA first form (final) U+188A MONGOLIAN LETTER ALI GALI NGA first form (final) U+188B MONGOLIAN LETTER ALI GALI CA first form (final) U+1894 MONGOLIAN LETTER ALI GALI SSA first form (final) U+1896 MONGOLIAN LETTER ALI GALI ZA first form (final) U+189A MONGOLIAN LETTER MANCHU ALI GALI GHA first form (final) U+189B MONGOLIAN LETTER MANCHU ALI GALI NGA first form (final) U+189C MONGOLIAN LETTER MANCHU ALI GALI CA first form (final) U+189D MONGOLIAN LETTER MANCHU ALI GALI JHA first form (final) U+189E MONGOLIAN LETTER MANCHU ALI GALI TTA first form (final) U+189F MONGOLIAN LETTER MANCHU ALI GALI DDHA first form (final) U+18A1 MONGOLIAN LETTER MANCHU ALI GALI DHA first form (final) U+18A2 MONGOLIAN LETTER MANCHU ALI GALI SSA first form (final) U+18A3 MONGOLIAN LETTER MANCHU ALI GALI CYA first form (final) U+18A4 MONGOLIAN LETTER MANCHU ALI GALI ZHA first form (final) U+18A5 MONGOLIAN LETTER MANCHU ALI GALI ZA first form (final) U+18AA MONGOLIAN LETTER MANCHU ALI GALI LHA first form (final) U+1133B COMBINING BINDU BELOW in the Grantha block is missing the dotted circle that was present in previous PDAM charts. U+11832 DOGRA VOWEL SIGN VOCALIC RR is missing the dotted circle that is present on other combining marks. ================================================================== There are also some issues that seem to be limited to Firefox; Chrome and Edge show the affected characters correctly. The three additions in the Chakma block show the completely wrong glyph; they look like Hangul jamo: U+11144 CHAKMA LETTER LHAA U+11145 CHAKMA VOWEL SIGN AA U+11146 CHAKMA VOWEL SIGN EI Several glyphs in the Ahom block are off-center and clip through the table borders: U+11712 AHOM LETTER A U+11713 AHOM LETTER DA U+11714 AHOM LETTER DHA U+11732 AHOM DIGIT TWO U+11733 AHOM DIGIT THREE U+11734 AHOM DIGIT FOUR The following characters are invisible: U+1180B DOGRA LETTER KHA U+1180C DOGRA LETTER GA U+1180D DOGRA LETTER GHA U+1180E DOGRA LETTER NGA Many glyphs in the Soyombo block are off-center and clip through the table borders. The following glyphs have self-intersection issues: U+1F96C LEAFY GREEN U+1F96D MANGO U+1F973 FACE WITH PARTY HORN AND PARTY HAT U+1F97C LAB COAT U+1F97D GOGGLES U+1F97F FLAT SHOE U+1F998 KANGAROO U+1F99B HIPPOPOTAMUS U+1F99D RACCOON U+1F99E LOBSTER U+1F99F MOSQUITO U+1F9B5 LEG U+1F9B6 FOOT U+1F9C1 CUPCAKE U+1F9B6 FOOT U+1F9E8 FIRECRACKER U+1F9EA TEST TUBE U+1F9EB PETRI DISH U+1F9ED COMPASS U+1F9EF FIRE EXTINGUISHER U+1F9F0 TOOLBOX U+1F9F2 MAGNET U+1F9F5 SPOOL OF THREAD U+1F9F6 BALL OF YARN U+1F9F7 SAFETY PIN U+1F9F8 TEDDY BEAR U+1F9F9 BROOM U+1F9FB ROLL OF PAPER
Date/Time: Sat Mar 10 13:30:56 CST 2018
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: Subheads for 0A3C and 0AD0
@ Various signs 0A3C GURMUKHI SIGN NUKTA = pairin bindi * for extending the alphabet to new letters @ Various signs 0AD0 GUJARATI OM "Various signs" should be "Sign" in both cases. Marc
Date/Time: Wed Mar 14 16:51:28 CDT 2018
Name: Tim Young
Report Type: Public Review Issue
Opt Subject: CJK Unified Ideographs additions
There are issues with the first three new additions to the main CJK Unified Ideographs block: U+9FEB: The first glyph is close to colliding with its source subtitle. U+9FEC: The two glyphs are not centered with each other, and the first glyph appears to be smaller. U+9FED: The second glyph is not parallel with the two glyphs immediately above where it should be. Instead, it appears on a new line parallel with U+9FE9.
Date/Time: Sat Mar 17 12:25:04 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: PRI #372: Line_Break of COPYLEFT SYMBOL
As with U+00A9 COPYRIGHT SIGN, the Line_Break of U+1F12F COPYLEFT SYMBOL should be AL.
Date/Time: Thu Mar 22 14:25:18 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: Confusion regarding the glyph of U+0F0E TIBETAN MARK NYIS SHAD
Regarding Tibetan punctuation, chapter 13 says “Because some writers use the double shay with a different spacing than would be obtained by coding two adjacent occurrences of U+0F0D, the double shay has been coded at U+0F0E with the intent that it would have a larger spacing between component shays than if two shays were simply written together. However, most writers do not use an unusual spacing between the double shay, so the application should allow the user to write two U+0F0D codes one after the other. Additionally, font designers will have to decide whether to implement these shays with a larger than normal gap.” I’ve downloaded a bunch of Tibetan fonts and most of them display U+0F0E as slightly narrower than two U+0F0Ds. Many make them the same width. A few of the Qomolangma fonts make U+0F0E slightly wider. The code chart glyph for U+0F0E consists of two shays so close together there is barely any space between them. If the standard is correct, the code chart glyph is misleading, if not wrong, and should have more space between the shays. If the majority of my test’s fonts are correct, chapter 13 should not imply their spacing is wrong.
Date/Time: Mon Mar 26 21:25:58 CDT 2018
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #372: Indic_Syllabic_Category of GURMUKHI SIGN UDAAT
According to its proposal documents, U+0A51 GURMUKHI SIGN UDAAT is a lexical tone mark, not a recitation mark, so its Indic_Syllabic_Category should be Tone_Mark, not Cantillation_Mark.
Date/Time: Thu Mar 29 07:56:21 CDT 2018
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: PRI #372 (Unicode Version 11.0 Beta) feedback
The representative glyph for U+8494 蒔 in the "H" column is incorrect in that the 士 component should be 土 to follow Hong Kong SAR regional conventions, and to match other ideographs in the same column that include the 寺 component. I discovered and reported in mid-March this error in Hong Kong SAR's glyph specification that covers Big Five and HKSCS-2016 proper, and which was corrected at the very end of March. See page 1,014 of the 1,014-page PDF (75MB): https://www.ogcio.gov.hk/en/our_work/business/tech_promotion/ccli/cliac/reference_glyphs.html Hong Kong SAR therefore needs to supply an updated version of the font, and while I will remind them to do so, I am reporting this issue here so that it doesn't fall between the cracks.
Date/Time: Wed Apr 4 14:44:05 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372 (Unicode Version 11.0 Beta) Typos
#1 @ Emoji components @+ The characters in the range 1F9B0..1F9B3 are intended to be used in ZWJ sequences to indicate hair style. "ZWJ sequences" should be "emoji ZWJ sequences". #2 1F96C LEAFY GREEN * intended to represent cooked green vegetables such as bok choy, kale, etc. An extra space shows up: "... as bok..."
Date/Time: Wed Apr 4 15:06:16 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372 (Unicode Version 11.0 Beta) PUNCTUS ELEVATUS MARK
2E4E PUNCTUS ELEVATUS MARK * indicates a minor medial pause where the sense is complete but the meaning is not “minor” should be “major” (“major medial pause). Sources: https://www.unicode.org/L2/L2015/15327r-n4704-medieval-punct.pdf : 2.2.1. Punctus elevatus. This was in origin an indicator of positura (ending a section) but which came to be used to indicate a major medial pause “where the sense is complete but the meaning is not” (Parkes p. 306). A.-É. Urfels-Capot, Le sanctoral du lectionnaire de l'office dominicain (1254-1256): La versiculation cistercienne adoptée par les dominicains repose sur la combinaison des signes suivants : (...) Punctus elevatus pour la pause intermédiaire, ou « médiale », majeure ; (...) On a donc affaire à un système à quatre niveaux apparents. Cependant, la distinction entre la pause médiale majeure (punctus elevatus) et la pause médiale mineure (punctus flexus) correspond essentiellement à des règles d’alternance musicales.
Date/Time: Wed Apr 4 15:34:19 CDT 2018
Name: Marc Lodewijck
Report Type: Error Report
Opt Subject: PRI #372: Unnecessary comments (0218..0219 and 021A..021B)
@ Additions for Romanian 0218 LATIN CAPITAL LETTER S WITH COMMA BELOW : 0053 0326 0219 LATIN SMALL LETTER S WITH COMMA BELOW * Romanian x (latin small letter s with cedilla - 015F) : 0073 0326 021A LATIN CAPITAL LETTER T WITH COMMA BELOW : 0054 0326 021B LATIN SMALL LETTER T WITH COMMA BELOW * Romanian x (latin small letter t with cedilla - 0163) : 0074 0326 Comments below 0219 and below 021B ("Romanian") are redundant in the light of the heading immediately before ("Additions for Romanian").
Date/Time: Fri Apr 6 20:19:21 CDT 2018
Name: Behnam Esfahbod
Report Type: Public Review Issue
Opt Subject: UCD 11.0.0 Beta data files
Hi there, The GraphemeBreakTest files have got new beta data files: - https://www.unicode.org/Public/11.0.0/ucd/auxiliary/GraphemeBreakTest-11.0.0d25.txt - https://www.unicode.org/Public/11.0.0/ucd/auxiliary/GraphemeBreakTest-11.0.0d25.html In these files, there are three (3) new property value name used: - ExtPict - Extend_ExtCccZwj - ZWJ_ExtCccZwj But there is no track of these values in other data files, specially: - https://www.unicode.org/Public/11.0.0/ucd/PropertyValueAliases-11.0.0d16.txt In the proposed specification: - https://www.unicode.org/reports/tr29/tr29-32.html#Grapheme_Cluster_Break_Property_Values there's only one new property mentioned, `Extended_Pictographic`, which seems to be aliased with `ExtPict`. Where would be the best place to find more information about the `*_ExtCccZwj` property values? Would any of the new values be reflected in `PropertyValueAliases` file during the beta release? Thanks, -Behnam
Date/Time: Fri Apr 13 12:27:19 CDT 2018
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #372: ASCII fallbacks in the names list
Because the names list’s character repertoire has been expanded, some notes should not use ASCII fallbacks. The notes for the Old Sogdian characters U+10F12, U+10F13, and U+10F27 should use “ʿ” instead of “`”. The notes for the Romanian values of the Duployan letters U+1BC24 and U+1BC26 should use “ș” instead of “s” and “sh”. U+1BC2A should use “ț” instead of “ts”. U+1BC64 should use “în” instead of “yn”.
Date/Time: Sat Apr 14 15:26:56 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372: sharp s (U+00DF and U+1E9E)
The use of U+1E9E (LATIN CAPITAL LETTER SHARP S), added to the Unicode Standard in 2008, is officially allowed as an ALTERNATE SPELLING by the Council for German Orthography since June 2017, after a long orthographic debate (its use is recommended since 2010 in official documentation when writing geographical names in capital letters). See, for instance: – http://typedrawers.com/discussion/2233/council-for-german-orthography-officially-allows-use-of-u-1e9e “What’s new here is that this is no longer merely recommends the cap eszett as a valid alternative for proper names but makes it a sanctioned alternate spelling in general orthography.” – http://www.rechtschreibrat.com/DOX/rfdr_Regeln_2016_veroeffentlicht_2017.pdf Rat für deutsche Rechtschreibung [Council for German Orthography], “Regeln und Wörterverzeichnis. Aktualisierte Fassung des amtlichen Regelwerks entsprechend den Empfehlungen des Rats für deutsche Rechtschreibung 2016”, Mannheim 2017. The following is an important excerpt from that document: https://www.screencast.com/t/Z9gJS3V6nG – http://www.stagn.de/SharedDocs/Downloads/DE/StAGN_Publikationen/161018_TopR06.pdf?__blob=publicationFile&v=3 Ständiger Ausschuss für geographische Namen (StAGN), “Toponymic Guidelines for Map and Other Editors for International Use”, 6th revised edition, 2016. For now, this is what we have in NamesList.txt file: @ Addition for German typography 1E9E LATIN CAPITAL LETTER SHARP S * lowercase is 00DF x (latin small letter sharp s - 00DF) The word “typography” is no more needed in the heading and should therefore be deleted, and the addition of a new comment (“is part of the official German orthography...”) would be welcome: @ Addition for German 1E9E LATIN CAPITAL LETTER SHARP S * lowercase is 00DF * is part of the official German orthography since 2017, and along with "SS" an allowed variant spelling of 00DF in "all caps" style x (latin small letter sharp s - 00DF) As regards U+00DF (small sharp s), we have the following description: 00DF LATIN SMALL LETTER SHARP S = Eszett * German * uppercase is "SS" * nonstandard uppercase is 1E9E * typographically the glyph for this character can be based on a ligature of 017F with either 0073 or with an old-style glyph for 007A (the latter similar in appearance to 0292). Both forms exist interchangeably today. x (greek small letter beta - 03B2) “uppercase is "SS"” should now read uppercase is "SS" or 1E9E”, and we could “usefully add a new comment (out of use in...”): 00DF LATIN SMALL LETTER SHARP S = Eszett * German * uppercase is "SS" or 1E9E * typographically the glyph for this character can be based on a ligature of 017F with either 0073 or with an old-style glyph for 007A (the latter similar in appearance to 0292). Both forms exist interchangeably today. * out of use in Swiss Standard German (Switzerland and Liechtenstein) x (greek small letter beta - 03B2) Source for Swiss Standard German: Rechtschreibleitfaden-2017.pdf, https://www.bk.admin.ch/bk/de/home/dokumentation/sprachen/hilfsmittel-textredaktion/leitfaden-zur-deutschen-rechtschreibung.html, retrieved April 14, 2018. Bundeskanzlei [Swiss Federal Chancellery], “Rechtschreibung. Leitfaden zur deutschen Rechtschreibung”, 4th edition 2017. Excerpt from that document (page 18): https://www.screencast.com/t/dX9YESZvuO Voilà.
Date/Time: Sun Apr 15 01:11:22 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372: additional comment(s) for U+02BB
02BB MODIFIER LETTER TURNED COMMA * typographical alternate for 02BD or 02BF * used in Hawai`ian orthography as `okina (glottal stop) x (combining turned comma above - 0312) x (nko low tone apostrophe - 07F5) x (left single quotation mark - 2018) A comment may be added above: * used in Tongan orthography as fakau`a (glottal stop) Source (one among others): New Zealand Government (Ministry of Education), “Faufaua! An Introduction to Tongan. Teachers’ Guide and Support Materials”, 2010. URL: pasifika.tki.org.nz/content/download/368/1869/file/Faufaua_Col_final.pdf Excerpt: https://www.screencast.com/t/uroFZMqyb Besides, U+02BB is used — as a diacritic? — in the 1995 revised Uzbek orthography using the Latin script; in shows up in four combinations: g\u02BB G\u02BB o\u02BB O\u02BB This should probably also be considered. Sources: Shavkat Rahmatullayev and Azim Hojiyev, eds., « O‘zbek tilining imlo lug'ati » [Dictionary of the Uzbek language], O‘qituvchi, Tashkent, 2011 (3rd edition), 240 pages. Available here: http://media.tdpu.uz/dl_image/IMG/01//000000002131/SERVICE/000000002131_01.PDF Excerpts: https://www.screencast.com/t/SWxkor4V Jacob M. Landau and Barbara Kellner-Heinkele, « Politics of language in the ex-Soviet Muslim states: Azerbaijan, Uzbekistan, Kazakhstan, Kyrgyzstan, Turkmenistan, Takikistan », Hurst, London, 2001, p. 137-138. Excerpt: https://www.screencast.com/t/I7jLs6sQ1F Omniglot: http://www.omniglot.com/writing/uzbek.htm Wikipedia (uz): https://uz.wikipedia.org/wiki/O%CA%BBzbek_lotin_alifbosi Thank you, Marc
Date/Time: Sun Apr 15 03:09:43 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372: additional comments (Livonian, Kildin Sami)
"Livonian" should be added to "Old Icelandic" for U+01EC and U+01ED: -- : remove ++ : add 01EC LATIN CAPITAL LETTER O WITH OGONEK AND MACRON : 01EA 0304 01ED LATIN SMALL LETTER O WITH OGONEK AND MACRON -- * Old Icelandic ++ * Old Icelandic, Livonian (in recent linguistic scholarship) "Livonian" should be added to "Portuguese, Estonian" for U+00F5: 00F5 LATIN SMALL LETTER O WITH TILDE -- * Portuguese, Estonian ++ * Portuguese, Estonian, Livonian : 006F 0303 Under "Additions for Romanian" (in the Latin Extended-B block), the comment lines "Romanian" should be deleted (see Wed Apr 4 report, above); actually, they should be replaced with new comments: @ Additions for Romanian 0218 LATIN CAPITAL LETTER S WITH COMMA BELOW : 0053 0326 0219 LATIN SMALL LETTER S WITH COMMA BELOW -- * Romanian ++ * also Kildin Sami, in a new Latin-based orthography introduced during the early Soviet period x (latin small letter s with cedilla - 015F) : 0073 0326 021A LATIN CAPITAL LETTER T WITH COMMA BELOW : 0054 0326 021B LATIN SMALL LETTER T WITH COMMA BELOW -- * Romanian ++ * also Livonian ++ * also Kildin Sami, in a new Latin-based orthography introduced during the early Soviet period x (latin small letter t with cedilla - 0163) : 0074 0326 Sources for Livonian: Uldis Balodis, “Livonian Orthography”, Virtual Livonia, 2017. See here: http://virtuallivonia.info/?page_id=133 Tuuli Tuisk, “Main Features of the Livonian Sound System and Pronunciation”, ESUKA – JEFUL 2016, 7–1: 121–143. Available here: http://jeful.ut.ee/index.php/JEFUL/article/download/jeful.2016.7.1.06/119 Exerpt: https://www.screencast.com/t/UkEVPu3qF Omniglot: https://www.omniglot.com/writing/livonian.htm Sources for Latin-based Kildin Sami: https://en.wikipedia.org/wiki/Kildin_Sami_orthography#The_Latin_period Excerpt (chart): https://www.screencast.com/t/NOubFIISJ Michael Rießler, “Towards a digital infrastructure for Kildin Saami”. Available here: http://www.siberian-studies.org/publications/PDF/sikriessler.pdf Excerpt: https://www.screencast.com/t/feXfdv7kKCcr Voilà. Thank you, Marc
Date/Time: Sun Apr 15 16:13:57 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #372 (consolidated feedback)
------------------------------------------------------------------------------------------------- U+0588 ֈ ARMENIAN SMALL LETTER YI WITH STROKE This character has a name that from an extrinsic point of view compromises character identity, so far as any distinction is made between a stroke and a bar. However, intrinsically, calling it “with stroke” is justified by (already earlier misnamed) U+0249 LATIN SMALL LETTER J WITH STROKE. Actually both of these letters are *with bar*, as can be induced from other pairs such as L and U with either stroke or bar, graphically well distinguished. Annotations should be added to all misnamed letters on a stroke‐bar confusion basis, to help translators fix these flaws on localized versionsʼ level. Missing informative aliases are responsible for terminological flaw spreading and contaminating other locales, such as French. Unicode has a powerful means to enforce correct understanding of character identity, thanks to house policies protecting characters against character identity corruption. Note: Appropriate name change had already been requested at PRI #352: Feedback on draft additional repertoire for Amendment 1.3 (PDAM) to ISO/IEC 10646:2017. See I.12 in L2/17-288: https://www.unicode.org/L2/L2017/17288-pri-comments.pdf ------------------------------------------------------------------------------------------------- U+05EF ׯ HEBREW YOD TRIANGLE See my PDAM ballot stage feedback, recommending also to change the header to “Logograph”. This sign is a writing convention replacing the Holy Name. See in the original Proposal. Deleting this sign is thus equivalent with deleting the Name, and the intended security fails. Therefore, an annotation should be added to prevent people from *intentionally* deleting this sign. Note: Appropriate name change had already been requested at PRI #352: Feedback on draft additional repertoire for Amendment 1.3 (PDAM) to ISO/IEC 10646:2017. See I.13 in L2/17-288. ------------------------------------------------------------------------------------------------- U+2BFE ⯾ REVERSED RIGHT ANGLE = without → 221F ∟ right angle This *new* pair is missing in BidiMirroring-11.0.0d3.txt, while other angle symbols are present. Alongside encoding new characters, bidi‐mirroring pairs should be added to the repertoire of the bidi‐mirroring‐glyph=yes repertoire as they get matching, provided that they conform to the requirement of ensuring readability in the absence of RTL glyph handling. Cf. feedback iteration in my previous post (off‐PRI). I note that this encoding fulfills item 1 of Table 14 in Remedial 19 in: http://www.unicode.org/L2/L2017/17438-bidi-math-fdbk.html Note that items 2 through 8 seem to be still missing in Unicode. Please refer to Table 14. ------------------------------------------------------------------------------------------------- U+A8FE ꣾ DEVANAGARI LETTER AY U+A8FF ◌ꣿ DEVANAGARI VOWEL SIGN AY These two characters are under the subhead “Additional vowel and vowel sign.” This conforms to a practice followed in the code charts, where independent vowels have names on LETTER and are headed with “Independent vowels,” while combining vowels have names on VOWEL SIGN and are headed with “Dependent vowel signs.” This is multiply inconsistent and uselessly complicated: ① Combining characters are usually referred to as “marks” in the Unicode standard. Using “signs” when referring to combining vowels in Brahmic scripts is a misleading inconsistency, as this would mean they are really what everywhere else in the Standard is called a “sign,” i.e. a symbol (e.g. the dollar sign; see the rationale of naming the copyleft symbol). ② Calling letter vowels “independent vowels” induces calling combining vowels “dependent vowels.” The other way around, calling combining vowels “dependent vowel signs” implicates that independent vowels are called either “independent vowel letters” or “independent vowel signs.” Ultimately, the new Devanagari Extended subheading “Additional vowel and vowel sign” results from a clash of colliding inconsistencies. Subheadings like in the Kharoshti block, e.g. “Vowels” before U+10A00 — a range containing both independent and dependent vowels, like the discussed Devanagari range — are proving that correct subheadings are already implemented in the Standard. Even in the main Devanagari block, the subheading before U+0960 is reading “Additional vowels for Sanskrit,” not “Additional vowels and vowel signs for Sanskrit,” although the range actually encompasses both independent (vocalic rr and ll) and dependent vowels (vocalic l and ll). Consequently, it is recommended that Devanagari Extended subheadings follow the same scheme. Solution: Change the subheading before U+A8FE from “Additional vowel and vowel sign” to “Additional vowels”. Harmonize relevant subheadings in all blocks containing Brahmic scripts. E.g. change “Dependent vowel signs” to “Dependent vowels” (e.g. before U+093A). Please complete with next item. ------------------------------------------------------------------------------------------------- U+11145 ◌ᅅ CHAKMA VOWEL SIGN AA U+11146 ◌ᅆ CHAKMA VOWEL SIGN EI These new combining vowels are grouped under the subheading “Dependent vowel signs.” Despite of “Dependent vowel sign[|s]” now occurring 55 times in the Code charts as a subheading, it represents the wrong option, as opposed to “Dependent vowels” — already present in the following blocks: Oriya (before U+0B62), Telugu (U+0C62), Kannada (U+0CE2), Malayalam (U+0D62), and Lepcha (U+1C26). The reason is that the concept of a “vowel sign” is proper to the writing system / encoding and calls for attributes like “combining” (whose opposite is “independent”), whereas a “vowel” is more polysemic, being mainly a linguistic entity — here, attributes like “dependent” and “independent” may apply — along with its use in writing and encoding, as a simple alternative to the more precise (and, depending on context, needlessly precise) “vowel sign.” Obviously the Unicode terminology is biased here by the superfluous presence of “SIGN” in the character names of most combining vowels in Brahmic scripts. Therefore in this context, when a subheading starts with “Dependent,” it should end in “vowels,” not in “vowel signs.” That brings the need to correct this and the other instances. Please consider this item along with the previous one. ------------------------------------------------------------------------------------------------- HANIFI ROHINGYA U+10D00..U+10D3F I wouldn’t comment [1] on the meritorious encoding of Hanifi Rohingya script, that is helping me in that it has vowel names without SIGN, simply VOWEL, like already in Tai Viet where seven of the vowels are combining marks (Gc=Mn), while eight are Gc=Lo although only five do precede consonants in visual and logical order. Hanifi Rohingya block is another template of streamlined vowel names. Iʼm sensitive because VOWEL SIGN in character names is untranslatable to French. Historically it ended up as “DIACRITIQUE VOYELLE”, which I proposed to rather replace with “VOYELLE COMBINANTE.” That however didnʼt gain traction, though admittedly the issue raised concerns. This is one more reason to be amazed to see Hanifi Rohingya having vowels without SIGN in their name. [1] Otherwise there would be to mention that a diacritic indicating a tone is called a “tone mark” throughout the standard (12 ranges) and typically has a name including the word TONE. That in turn translates well (MARQUE TONALE), so this asperity is palliatable in Code Charts translations. ------------------------------------------------------------------------------------------------- SOGDIAN U+10F30..U+10F6F The diacritics in range U+10F46..10F50 are categorized as “combining signs” in the Proposal (§3.3) and as “Modifier signs” in the relevant delta code chart actually under beta review: http://www.unicode.org/charts/PDF/Unicode-11.0/U110-10F30.pdf In the Unicode standard, the term “modifier” is used in conjunction with “letter” for independent characters. Combining characters in turn are known as “marks” as in “non‐spacing combining mark” and “spacing combining mark.” Hence Code chart readers would be most likely to expect the Sogdian diacritics under a subheading such as “Diacritics” (cf. those before U+07F2, U+0859, U+1CE2, U+302A, and of course U+0300) or “Combining marks” (U+135D, U+2CEF, U+A6F0, U+10AE5). To date, in the Code charts, “Modifier sign” is newly introduced by encoding the Sogdian block. Elsewhere on the internet, the term is used in programming, though fairly seldom. ------------------------------------------------------------------------------------------------- U+110CD KAITHI NUMBER SIGN ABOVE This new character follows a range of “Various signs” but leaving out some unassigned codepoints for a handy row shift that enhances legibility of the code chart. Hence the subheading has been repeated: “Sign.” There is however an issue with the design of the last ranges in this block. U+110C0 and U+110C1 are the danda and double danda for Kaithi, like in a number of other Brahmic scripts that donʼt use the Devanagari punctuations for the dandas. In every single block of these scripts having extra dandas, *** DANDA and *** DOUBLE DANDA are in a “Punctuation” range. Kaithi is the only block in the Standard where script‐specific dandas are merged with other signs under a generic subheading. In order to harmonize the presentation of the Kaithi block, avoiding an impression of neglectedness, I recommend to modify the subheadings as follows. Iʼd also add a cross‐reference to U+110BC as results from a mention in the encoding proposal, p. 34 (p. 39 of the PDF): https://www.unicode.org/L2/L2008/08194-n3389-kaithi.pdf @ Various signs 110B9 KAITHI SIGN VIRAMA 110BA KAITHI SIGN NUKTA 110BB KAITHI ABBREVIATION SIGN @ Number signs 110BC KAITHI ENUMERATION SIGN x (numero sign - 2116) 110BD KAITHI NUMBER SIGN * used to indicate a numerical reference @ Punctuation 110BE KAITHI SECTION MARK * marks end of sentence x (khojki section mark - 1123B) 110BF KAITHI DOUBLE SECTION MARK * delimits larger chunks of text, such as paragraphs x (khojki double section mark - 1123C) 110C0 KAITHI DANDA 110C1 KAITHI DOUBLE DANDA @ Number sign 110CD KAITHI NUMBER SIGN ABOVE * used to indicate a number in an itemized list ------------------------------------------------------------------------------------------------- U+11A9D SOYOMBO MARK PLUTA This character has a generic range heading, while U+11A98 SOYOMBO GEMINATION MARK has a specific one, that is already a hapax; and both instances are single character ranges. Suggestion: Change “Additional mark” to “Elongation mark.” ------------------------------------------------------------------------------------------------- GUNJALA GONDI U+11D60..U+11DAF The first range heading must be “Independent vowels” like everywhere else in the Code Charts in such a script configuration, not just “Vowels” (which is also misleading). “Dependent vowel signs” should be changed to “Dependent vowels” (see other comment). U+11D98 GUNJALA GONDI OM range heading could be “Invocation sign” following U+11449 NEWA OM. ------------------------------------------------------------------------------------------------- U+11D97 GUNJALA GONDI VIRAMA The annotation “used for producing conjuncts” accords with that of U+11D45 MASARAM GONDI VIRAMA. However that of U+11133 CHAKMA VIRAMA, “used to form conjuncts,” looks simpler English. Suggestion: Replace both instances (MASARAM GONDI and GUNJALA GONDI) whith: * used to form conjuncts By contrast, should “used to form conjuncts” in this context be poor English (due to ambiguous semantics), I strongly recommend to equalize all instances on the “used for producing conjuncts” template. Please note BTW that in French there is an attempt to harmonize such iterations anyway (“sert à former des ligatures”), trying to hinder problems in the English version from impacting localized versions. Cf. draft preview 10.0.0: http://docucaras.info/#u11D45 ------------------------------------------------------------------------------------------------- MAYAN NUMERALS U+1D2E0..U+1D2FF Range headings usually donʼt include the script name. In this block, the (only) range heading is a replication of the block name. Recommendation: Change “Mayan numerals” (subheading) to “Numerals”. -------------------------------------------------------------------------------------------------
Date/Time: Sun Apr 15 13:23:28 CDT 2018
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: General feedback, pending, consolidated
Hello, Thank you for reminding beta will close soon. Several pieces are in project but cannot be worked out due to other urgencies. ================================================================================================== BIDI‑MIRRORNG PAIRS FEEDBACK ITERATION Second, the bidi mirroring pairs including tildes should be locked out from bidi-mirrored‐glyph=yes feature. That request is already documented in a feedback item that was submitted in time for UTC meeting #154 and posted to the registry *before* that meeting, listed in the meeting agenda, but *not* considered: http://www.unicode.org/L2/L2017/17438-bidi-math-fdbk.html Please note that this is revision 7 of January 18, 2018, superseding failed 18/026 (January 15). Quote from section 3.1: Whether tildes are mirrored or not, does matter in typography, but mostly not for readability. When writing direction changes, switching the < >-like operators is absolute priority, whatever environment the text is displayed in. Therefore, the missing best-fit pairs should be added either to BidiMirroring.txt, or to the new *BidiMirroringExtended.txt. However, when discussing the requirements for tilde rendering, there is a need to underscore the semantic difference in three pairs of symbols that exist with tilde and with reversed tilde. Two of these pairs are mirrored by glyph exchange, while the third pair like all other tilde symbols is mirrored by RTL glyphs only (Table 5). Again, that works fine in publishing, when all tildes are mirrored anyway. But as glyph-exchange bidi-mirroring is not designed just as a convenience to streamline high-end rendering algorithms, but as a last resort to facilitate a usable display in whatever environment, there is scarcely any point in mirroring just two pairs, because the effect would be to merge the reversed tildes among the unmirrored ones, while the normal tildes stand out as if they were reversed (Figure 4). REMEDIAL 11: In BidiMirroring.txt: Remove the pairs in Table 6 from the pair mapping list in order to equalize the mirroring behavior of all operators with tilde or reversed tilde. Table 6. Mirror pairs of operators with tilde or reversed tilde, to be unpaired for consistency 1 223C ∼ TILDE OPERATOR and 223D ∽ REVERSED TILDE 2 2243 ≃ ASYMPTOTICALLY EQUAL TO and 22CD ⋍ REVERSED TILDE EQUALS ================================================================================================== UNICODE 11.0 BETA REVIEW FEEDBACK ITEMS, consolidated, will follow in a separate post. Best regards, Marcel
Date/Time: Tue Apr 17 04:50:57 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI 372 addendum as an update of lastly posted item
Update wrt BidiMirroring-11.0.0d3.txt: To the quoted Table 6. “Mirror pairs of operators with tilde or reversed tilde, to be unpaired for consistency,” the following pair should be added, as it has recently been given the BidiMirroredGlyph=yes property, against recommendation in L2/17-438: #3: 2245 ≅ APPROXIMATELY EQUAL TO and 224C ≌ ALL EQUAL TO When sending “BIDI‑MIRRORNG PAIRS FEEDBACK ITERATION” (Sun Apr 15 13:23:28 CDT 2018), I believed that this is clear from the context referring to “while the third pair like all other tilde symbols is mirrored by RTL glyphs only (Table 5).” Note: This information was available to the UTC when considering the *superseded* L2/18-026 — that was listed *after* the up‐to‐date L2/17-438 in meeting agenda #154 — in section 3.1 With tilde or question mark.
Date/Time: Tue Apr 17 05:58:35 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI 372 NamedSequencesProv.txt
The sequences listed in: http://unicode.org/mail-arch/unicode-ml/y2016-m02/0071.html should be added to NamedSequencesProv.txt as recommended in: http://unicode.org/mail-arch/unicode-ml/y2016-m02/0072.html following the process specified in: http://www.unicode.org/reports/tr34/ Section 3.1.
Date/Time: Sat Apr 21 15:20:08 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372: Dogri vs Dogra
02BC MODIFIER LETTER APOSTROPHE = apostrophe * glottal stop, glottalization, ejective * many languages use this as a letter of their alphabets * used as a tone marker in Bodo, Dogri, and Maithili * 2019 is the preferred character for a punctuation apostrophe "Dogri should be Dogra, only in order to be consistent with the name given "to this script in Unicode (Dogra block, 11800..1184F): * used as a tone marker in Bodo, Dogra, and Maithili
Date/Time: Sat Apr 21 15:36:19 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372: Update for 0218..021B (revision 2)
[Revision for Sun Apr 15 03:09:43 report (above).] I was way wrong. A thousand lashes to me for misleading you due to my neglectful ignorance regarding previous discussions about cedilla and comma below. It is easy to become confused, though... So it is important to understand that the characters in the range 0218..021B are intended SOLELY for Romanian, aren’t they? And thus, no comment lines previously requested should be added beneath 0219 and 021B: @ Additions for Romanian 0218 LATIN CAPITAL LETTER S WITH COMMA BELOW : 0053 0326 0219 LATIN SMALL LETTER S WITH COMMA BELOW x (latin small letter s with cedilla - 015F) : 0073 0326 021A LATIN CAPITAL LETTER T WITH COMMA BELOW : 0054 0326 021B LATIN SMALL LETTER T WITH COMMA BELOW x (latin small letter t with cedilla - 0163) : 0074 0326 BTW, I recommend to your reading “The story of Ș and Ț” (told in a most humorous fashion) here: http://kitblog.com/2008/10/romanian_diacritic_marks.html. Thank you.
Date/Time: Sat Apr 21 16:07:46 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372: typos
#1 2720 MALTESE CROSS * Historically, the Maltese cross took many forms; the shape shown in the Zapf Dingbats is similar to one known as the Cross Formée. Comment line should be (without sentence capital letter nor final full stop): * historically, the Maltese cross took many forms; the shape shown in the Zapf Dingbats is similar to one known as the Cross Formée #2 A full stop should be placed at the end of the lines 18374 and 20611: @+ Used together with 2605 in systems of ratings. @+ See also the Bopomofo block.
Date/Time: Mon Apr 23 12:26:25 CDT 2018
Name: Srinidhi A
Report Type: Public Review Issue
Opt Subject: Feedback on PRI #372 Unicode 11.0 Beta
As per Action item 154-A103 glyphs of Brahmi 30 and 40 should be changed. In Errata https://www.unicode.org/errata/ and Page 32,33 of WG2 N4941 https://unicode.org/wg2/docs/n4941-pdam2-3-chart.pdf these glyphs are updated. In https://www.unicode.org/charts/PDF/Unicode-11.0/U110-11000.pdf they are not updated. Similarly these two glyphs should also be updated in above Code chart
Date/Time: Mon Apr 23 15:22:56 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372 Old Malay?
06BD ARABIC LETTER NOON WITH THREE DOTS ABOVE * old Malay "old Malay" should be "Jawi". Unicode 10.0 core specification, page 390: "Jawi. U+06BD ARABIC LETTER NOON WITH THREE DOTS ABOVE is used for Jawi, which is Malay written using the Arabic script. Malay users know the character as Jawi Nya. Contrary to what is suggested by its Unicode character name, U+06BD displays with the three dots below the letter pointing downward when it is in the initial or medial position, making it look exactly like the initial and medial forms of U+067E ARABIC LETTER PEH. This is done to avoid confusion with U+062B ARABIC LETTER THEH, which appears in words of Arabic origin, and which has the same base letter shapes in initial or medial position, but with three dots above in all positions." "Old Malay" is misleading: "Old Malay (7th to 14th century). (…) With the penetration and proliferation of Dravidian vocabulary and the influence of major Indian religions, Ancient Malay evolved into the Old Malay language. (…) Old Malay inscriptions used either scripts of Indian origin such as Pallava, Nagari or the Indian-influenced Old Sumatran characters. Jawi is Malay written using the Arabic script... which has been adapted to suit the spoken CLASSICAL MALAY. The Jawi script has existed since the 17th century, in the period of Classical Malay, which "started when Islam gained its foothold in [Southeast Asia] and the elevation of its status to a state religion." "Jawi scripts have been going through development and standardized process until the present days. From 29 alphabets, it has increased to 36 alphabets with the additional alphabets such as ‘Cha’, ‘Nga’, ‘Pa’, ‘Ga’, ‘Nya’, ‘Va’, and ‘Lam Alif’. As there have been growth in Jawi scripts and the world entering modernization era, it is unavoidable that there are also changes taking place in learning and writing Jawi scripts from the past and present days. According to al-Attas (1990), Jawi script is a result of the renovation and addition of Arabic letters to adapt to the Malay language to facilitate the teaching and learning process. References: ASYRAF HJ AB RAHMAN, ABDUL MANAN ALI, FARHANAH BT ABDULLAH, FIRDAUS KHAIRI ABDUL KADIR, FADZLI ADAM, DAUD ISMAIL, “Methods of Learning and Writing Jawi Scripts within the Malay Community: Past and Present Experiences”, Proceedings of ISER 70th International Conference, Athens, Greece, 7th-8th August 2017. Available here: http://www.worldresearchlibrary.org/up_proc/pdf/1006-15051919706-12.pdf https://en.wikipedia.org/wiki/History_of_the_Malay_language ------------------------------------------------------- 06AC ARABIC LETTER KAF WITH DOT ABOVE * old Malay Annotation should be changed: 06AC ARABIC LETTER KAF WITH DOT ABOVE * its use for the Jawi gaf is not recommended, though it may be found in some existing text data; recommended character for Jawi gaf is 0762 ---- 0762 ARABIC LETTER KEHEH WITH DOT ABOVE * old Malay, preferred to 06AC x (arabic letter kaf with dot above - 06AC) "old Malay" should be "Jawi". Reference: Jonathan Kew, “Notes on some Unicode Arabic characters: recommendations for usage”, Draft 2 — April 21, 2005. https://www.google.be/url?sa=t&rct=j&q=&esrc=s&source=web&cd=30&ved=2ahUKEwiosuK3qc7aAhWE6aQKHZqHCNo4FBAWMAl6BAgAEGU&url=http%3A%2F%2Fscripts.sil.org%2Fcms%2Fscripts%2Frender_download.php%3Fformat%3Dfile%26media_id%3Darabicletterusagenotes%26filename%3DArabicLetterUsageNotes.pdf&usg=AOvVaw2C_FehZaGkFGOa0IK95sXM
------------------------------------------------------- 06A0 ARABIC LETTER AIN WITH THREE DOTS ABOVE * old Malay "old Malay" should be "Jawi". ------------------------------------------------------- 06A4 ARABIC LETTER VEH * Middle Eastern Arabic for foreign words * Kurdish, Khwarazmian, early Persian "Jawi" should be added: * Kurdish, Khwarazmian, early Persian, Jawi References: Report for Malaysia’s Internationalized Domain Name: Jawi Language Issues [2009]. http://css.escwa.org.lb/ictd/0960/01.pdf https://www.iana.org/domains/idn-tables/tables/my_ms-my_1.0.pdf ------------------------------------------------------- 06AD ARABIC LETTER NG * Uighur, Kazakh, old Malay, early Persian, ... 06D1 ARABIC LETTER YEH WITH THREE DOTS BELOW * old Malay "old Malay" should PROBABLY be removed in both annotations. These two characters are not listed in "Report for Malaysia’s Internationalized Domain Name" (see above) and are not mentioned as Jawi characters in Jonathan Kew (see ref. above).
Date/Time: Mon Apr 23 15:42:03 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372 Old Urdu and Old Hausa
0690 ARABIC LETTER DAL WITH FOUR DOTS ABOVE * old Urdu, not in current use 069F ARABIC LETTER TAH WITH THREE DOTS ABOVE * old Hausa "old" should be capitalized: "Old Urdu" and "Old Hausa".
Date/Time: Mon Apr 23 22:40:38 CDT 2018
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #372 Typos in NamesList.txt
For consistency, a full stop should be placed at the end of the following lines: @+ Intended for use with a multiline scored layout @+ These are intended for bracketing terms of mathematical expressions where their glyph extends to accommodate the width of the bracketed expression @+ See the Basic Latin block starting at 0020 @+ See the Armenian block starting at 0530 @+ See the Hebrew block starting at 0590 @+ See ASCII 0020-007E @+ Constitute a set as follows: 22C5, 2219, 1F784, 2022, 2981, 26AB, 25CF, and 2B24 @+ Constitute a set as follows: 25CB, 2B58, 1F785-1F789 @+ Constitute a set as follows: 2299, 1F78A, and 29BF @+ Constitute a set as follows: 1F78C, 2B1D, 1F78D, 25AA, 25FE, 25FC, 25A0, and 2B1B @+ Constitute a set as follows: 25A1, 1F78E-1F793 @+ Constitute a set as follows: 1F794, 25A3, and 1F795 @+ Constitute a set as follows: 1F797, 1F798, 2B29, 1F799, 2B25, and 25C6 @+ Constitute a set as follows: 1F79A, 25C8, and 1F79B @+ Constitute a set as follows: 1F79D, 1F79E, 2B2A, 1F79F, 2B27, and 29EB @+ Constitute a set as follows: 1F7C9, 2605, 1F7CA, and 272F @+ Constitute a set as follows: 2736, 1F7CB-1F7CD @+ Constitute a set as follows: 2735, 1F7CE-1F7D1 @+ Constitute a set as follows: 1F7D2, 2739, 1F7D3, and 1F7D4 A full stop should be placed at the end of the following lines, and spaces should be removed in front of and behind the hyphen seperator (as in “See ASCII 0020-007E”, above): @+ See CJK punctuation 3000 - 303F @+ See Katakana 30A0 - 30FF @+ See Hangul Compatibility Jamo 3130 - 318F @+ See Latin-1 00A0 - 00FF Thank you.
Date/Time: Tue Apr 24 02:12:15 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI 372 NamedSequencesProv.txt
The following list of named sequences is made up from the list provided by Mats Blakstad on the Public Mailing List: http://unicode.org/mail-arch/unicode-ml/y2016-m02/0071.html # Additions for languages in Togo. LATIN CAPITAL LETTER A WITH TILDE AND GRAVE ACCENT;00C3 0300 LATIN SMALL LETTER A WITH TILDE AND GRAVE ACCENT;00E3 0300 LATIN CAPITAL LETTER A WITH TILDE AND ACUTE ACCENT;00C3 0301 LATIN SMALL LETTER A WITH TILDE AND ACUTE ACCENT;00E3 0301 LATIN CAPITAL LETTER E WITH TILDE AND GRAVE ACCENT;1EBC 0300 LATIN SMALL LETTER E WITH TILDE AND GRAVE ACCENT;1EBD 0300 LATIN CAPITAL LETTER E WITH TILDE AND ACUTE ACCENT;1EBC 0301 LATIN SMALL LETTER E WITH TILDE AND ACUTE ACCENT;1EBD 0301 LATIN CAPITAL LETTER TURNED E WITH GRAVE ACCENT;018E 0300 LATIN SMALL LETTER TURNED E WITH GRAVE ACCENT;01DD 0300 LATIN CAPITAL LETTER TURNED E WITH ACUTE ACCENT;018E 0301 LATIN SMALL LETTER TURNED E WITH ACUTE ACCENT;01DD 0301 LATIN CAPITAL LETTER TURNED E WITH CIRCUMFLEX ACCENT;018E 0302 LATIN SMALL LETTER TURNED E WITH CIRCUMFLEX ACCENT;01DD 0302 LATIN CAPITAL LETTER TURNED E WITH TILDE;018E 0303 LATIN SMALL LETTER TURNED E WITH TILDE;01DD 0303 LATIN CAPITAL LETTER TURNED E WITH TILDE AND GRAVE ACCENT;018E 0303 0300 LATIN SMALL LETTER TURNED E WITH TILDE AND GRAVE ACCENT;01DD 0303 0300 LATIN CAPITAL LETTER TURNED E WITH TILDE AND ACUTE ACCENT;018E 0303 0301 LATIN SMALL LETTER TURNED E WITH TILDE AND ACUTE ACCENT;01DD 0303 0301 LATIN CAPITAL LETTER TURNED E WITH MACRON;018E 0304 LATIN SMALL LETTER TURNED E WITH MACRON;01DD 0304 LATIN CAPITAL LETTER TURNED E WITH CARON;018E 030C LATIN SMALL LETTER TURNED E WITH CARON;01DD 030C LATIN CAPITAL LETTER OPEN E WITH GRAVE ACCENT;0190 0300 LATIN SMALL LETTER OPEN E WITH GRAVE ACCENT;025B 0300 LATIN CAPITAL LETTER OPEN E WITH ACUTE ACCENT;0190 0301 LATIN SMALL LETTER OPEN E WITH ACUTE ACCENT;025B 0301 LATIN CAPITAL LETTER OPEN E WITH CIRCUMFLEX ACCENT;0190 0302 LATIN SMALL LETTER OPEN E WITH CIRCUMFLEX ACCENT;025B 0302 LATIN CAPITAL LETTER OPEN E WITH TILDE;0190 0303 LATIN SMALL LETTER OPEN E WITH TILDE;025B 0303 LATIN CAPITAL LETTER OPEN E WITH TILDE AND GRAVE ACCENT;0190 0303 0300 LATIN SMALL LETTER OPEN E WITH TILDE AND GRAVE ACCENT;025B 0303 0300 LATIN CAPITAL LETTER OPEN E WITH TILDE AND ACUTE ACCENT;0190 0303 0301 LATIN SMALL LETTER OPEN E WITH TILDE AND ACUTE ACCENT;025B 0303 0301 LATIN CAPITAL LETTER OPEN E WITH MACRON;0190 0304 LATIN SMALL LETTER OPEN E WITH MACRON;025B 0304 LATIN CAPITAL LETTER OPEN E WITH CARON;0190 030C LATIN SMALL LETTER OPEN E WITH CARON;025B 030C LATIN CAPITAL LETTER I WITH TILDE AND GRAVE ACCENT;0128 0300 LATIN SMALL LETTER I WITH TILDE AND GRAVE ACCENT;0129 0300 LATIN CAPITAL LETTER I WITH TILDE AND ACUTE ACCENT;0128 0301 LATIN SMALL LETTER I WITH TILDE AND ACUTE ACCENT;0129 0301 LATIN CAPITAL LETTER IOTA WITH GRAVE ACCENT;0196 0300 LATIN SMALL LETTER IOTA WITH GRAVE ACCENT;0269 0300 LATIN CAPITAL LETTER IOTA WITH ACUTE ACCENT;0196 0301 LATIN SMALL LETTER IOTA WITH ACUTE ACCENT;0269 0301 LATIN CAPITAL LETTER IOTA WITH CIRCUMFLEX ACCENT;0196 0302 LATIN SMALL LETTER IOTA WITH CIRCUMFLEX ACCENT;0269 0302 LATIN CAPITAL LETTER IOTA WITH MACRON;0196 0304 LATIN SMALL LETTER IOTA WITH MACRON;0269 0304 LATIN CAPITAL LETTER IOTA WITH CARON;0196 030C LATIN SMALL LETTER IOTA WITH CARON;0269 030C LATIN CAPITAL LETTER M WITH GRAVE ACCENT;004D 0300 LATIN SMALL LETTER M WITH GRAVE ACCENT;006D 0300 LATIN CAPITAL LETTER ENG WITH GRAVE ACCENT;014A 0300 LATIN SMALL LETTER ENG WITH GRAVE ACCENT;014B 0300 LATIN CAPITAL LETTER ENG WITH ACUTE ACCENT;014A 0301 LATIN SMALL LETTER ENG WITH ACUTE ACCENT;014B 0301 LATIN CAPITAL LETTER O WITH TILDE AND GRAVE ACCENT;00F5 0300 LATIN SMALL LETTER O WITH TILDE AND GRAVE ACCENT;00F5 0300 LATIN CAPITAL LETTER OPEN O WITH GRAVE ACCENT;0186 0300 LATIN SMALL LETTER OPEN O WITH GRAVE ACCENT;0254 0300 LATIN CAPITAL LETTER OPEN O WITH ACUTE ACCENT;0186 0301 LATIN SMALL LETTER OPEN O WITH ACUTE ACCENT;0254 0301 LATIN CAPITAL LETTER OPEN O WITH CIRCUMFLEX ACCENT;0186 0302 LATIN SMALL LETTER OPEN O WITH CIRCUMFLEX ACCENT;0254 0302 LATIN CAPITAL LETTER OPEN O WITH TILDE;0186 0303 LATIN SMALL LETTER OPEN O WITH TILDE;0254 0303 LATIN CAPITAL LETTER OPEN O WITH TILDE AND GRAVE ACCENT;0186 0303 0300 LATIN SMALL LETTER OPEN O WITH TILDE AND GRAVE ACCENT;0254 0303 0300 LATIN CAPITAL LETTER OPEN O WITH TILDE AND ACUTE ACCENT;0186 0303 0301 LATIN SMALL LETTER OPEN O WITH TILDE AND ACUTE ACCENT;0254 0303 0301 LATIN CAPITAL LETTER OPEN O WITH MACRON;0186 0304 LATIN SMALL LETTER OPEN O WITH MACRON;0254 0304 LATIN CAPITAL LETTER OPEN O WITH CARON;0186 030C LATIN SMALL LETTER OPEN O WITH CARON;0254 030C LATIN CAPITAL LETTER U WITH TILDE AND GRAVE ACCENT;0168 0300 LATIN SMALL LETTER U WITH TILDE AND GRAVE ACCENT;0169 0300 LATIN CAPITAL LETTER V WITH HOOK WITH GRAVE ACCENT;01B2 0300 LATIN SMALL LETTER V WITH HOOK WITH GRAVE ACCENT;028B 0300 LATIN CAPITAL LETTER V WITH HOOK WITH ACUTE ACCENT;01B2 0301 LATIN SMALL LETTER V WITH HOOK WITH ACUTE ACCENT;028B 0301 LATIN CAPITAL LETTER V WITH HOOK WITH CIRCUMFLEX ACCENT;01B2 0302 LATIN SMALL LETTER V WITH HOOK WITH CIRCUMFLEX ACCENT;028B 0302 LATIN CAPITAL LETTER V WITH HOOK WITH MACRON;01B2 0304 LATIN SMALL LETTER V WITH HOOK WITH MACRON;028B 0304 LATIN CAPITAL LETTER V WITH HOOK WITH CARON;01B2 030C LATIN SMALL LETTER V WITH HOOK WITH CARON;028B 030C LATIN CAPITAL LETTER UPSILONK WITH GRAVE ACCENT;01B1 0300 LATIN SMALL LETTER UPSILON WITH GRAVE ACCENT;028A 0300 LATIN CAPITAL LETTER UPSILON WITH ACUTE ACCENT;01B1 0301 LATIN SMALL LETTER UPSILON WITH ACUTE ACCENT;028A 0301 LATIN CAPITAL LETTER UPSILON WITH CIRCUMFLEX ACCENT;01B1 0302 LATIN SMALL LETTER UPSILON WITH CIRCUMFLEX ACCENT;028A 0302 LATIN CAPITAL LETTER UPSILON WITH MACRON;01B1 0304 LATIN SMALL LETTER UPSILON WITH MACRON;028A 0304 LATIN CAPITAL LETTER UPSILON WITH CARON;01B1 030C LATIN SMALL LETTER UPSILON WITH CARON;028A 030C
Date/Time: Wed Apr 25 14:06:31 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #372 (consolidated feedback)
Hello, There is another *consolidated* list of feedback for the Code Charts, that is not specifically related to the 11.0.0 additions. Some are *revised* versions of items hastily submitted on April 30, 2017. Sadly this is only a (huge) subset of my queue, given that I’m late submitting this feedback Thanks, Marcel ------------------------------------------------------------------ C1 controls The glyphs of some C1 controls show acronyms of other aliases than those given in the Nameslist section of the Code charts: 008B <control> = PARTIAL LINE FORWARD has PLD 008C <control> = PARTIAL LINE BACKWARD has PLU 008D <control> = REVERSE LINE FEED has RI Suggestion: Add another informative alias to each one: 008B = PARTIAL LINE DOWNWARD 008C = PARTIAL LINE UPWARD 008D = REVERSE INTERLIGN ------------------------------------------------------------------ U+034F COMBINING GRAPHEME JOINER Some changes might make this instance better understandable to the Code Charts reader: @ Format control # replaces "Grapheme joiner" 034F COMBINING GRAPHEME JOINER = combining mark locker # added informative alias * commonly abbreviated as CGJ * may be considered a “joiner” only in that it prevents combining marks from reordering # comment line raised and reworded * has no visible glyph Remove: * the name of this character is misleading; it does not actually join graphemes ------------------------------------------------------------------ U+202F NARROW NO-BREAK SPACE This space is set apart (due to late encoding), so some usage annotations may seem desirable: 202F NARROW NO-BREAK SPACE * commonly abbreviated NNBSP * a narrow form of a no-break space, typically the width of a thin space or a mid space * Mongolian, Phags-Pa # added comment line * French: used to space punctuations # added comment line ------------------------------------------------------------------ U+2300 DIAMETER SIGN People often use LATIN SMALL LETTER O WITH STROKE as a fallback, by lack of DIAMETER SIGN on keyboards. So it might be a good idea to crossreference both, directing users to correct usage: x (latin small letter o with stroke - 00F8) # added crossreference x (empty set - 2205) 00F8 LATIN SMALL LETTER O WITH STROKE = o slash * Danish, Norwegian, Faroese, IPA x (diameter sign - 2300) # added crossreference ------------------------------------------------------------------ U+2327 X IN A RECTANGLE BOX People often misuse this because it comes first when browsing charmaps. So I’ve added a crossreference to the BALLOT BOX WITH X: = clear key x (ballot box with x - 2612) # added xref ------------------------------------------------------------------ U+232C BENZENE RING Even though in the same block, the BENZENE RING WITH and without CIRCLE would be nice with a crossreference to each other: x (benzene ring with circle - 23E3) # added 23E3 BENZENE RING WITH CIRCLE x (benzene ring - 232C) # added ------------------------------------------------------------------ U+260A ASCENDING NODE There seems to be an equivalence between ascending node, libra, and sublimation. Accordingly, the informative alias: = alchemical symbol for sublimate can be corrected to: = alchemical symbol for sublimation and the relevant crossreference added: x (alchemical symbol for sublimation - 1F75E) like already in: 260B DESCENDING NODE = alchemical symbol for purify x (alchemical symbol for purify - 1F763) See also the already present crossreferences in: 1F75E ALCHEMICAL SYMBOL FOR SUBLIMATION x (ascending node - 260A) x (libra - 264E) ------------------------------------------------------------------ Medical and healing symbols (2624..2625) These are also religious symbols, as both were attributes of deities, and the ankh keeps being used in religion. Therefore I’d suggest to merge this subheading with the subsequent one, and to add some informative aliases: @ Religious, political and medical symbols # modified subheading 2624 CADUCEUS = commercial # added * symbol of commerce and eloquence # added * symbolizes medecine in Northern America # added x (staff of aesculapius - 2695) x (alchemical symbol for caduceus - 1F750) 2625 ANKH = ansate cross # added = coptic cross # added * egyptian hieroglyph for “life” # added x (egyptian hieroglyph s034 - 132F9) # added # removed subheading 2626 ORTHODOX CROSS 2627 CHI RHO = Constantine's cross, Christogram x (coptic symbol khi ro - 2CE9) 2628 CROSS OF LORRAINE = patriarchal cross # added x (double dagger - 2021) # added 2629 CROSS OF JERUSALEM = simple cross potent * contrasts with the actual cross of Jerusalem, which adds a small crosslet at each corner x (alchemical symbol for vinegar - 1F70A) 262A STAR AND CRESCENT 262B FARSI SYMBOL = symbol of iran (1.0) 262C ADI SHAKTI = Gurmukhi khanda 262D HAMMER AND SICKLE 262E PEACE SYMBOL 262F YIN YANG x (tibetan symbol nor bu nyis -khyil - 0FCA) ------------------------------------------------------------------ Emoticons (1F600..1F64F) Following Code Charts usage, a reference is added to the previously encoded emoji in the Miscellaneous symbols block. Cf. the relevant subheading: @ Emoticons @+ Many other emoticons are encoded in the Emoticons block starting at 1F600. Suggestion: @@ 1F600 Emoticons 1F64F @+ The emoticons have been organized by mouth shape to make it easier to locate the different characters in the code chart. @+ Some other emoticons are encoded in the Miscellaneous symbols block starting at 2600. # = added annotation; references to blocks may also use complete block ranges: (2600..26FF). x (white frowning face - 2639) # added x (white smiling face - 263A) # added x (black smiling face - 263B) # added ------------------------------------------------------------------ U+267E PERMANENT PAPER SIGN U+267F WHEELCHAIR SYMBOL I’d suggest adding aliases and an xref, e.g.: 267E PERMANENT PAPER SIGN = non‐acid paper # added x (infinity - 221E) # added 267F WHEELCHAIR SYMBOL = accessible place # added ------------------------------------------------------------------ U+AA40 CHAM LETTER FINAL K This and the following letters are under a @ Final letters subheading, that is unusual in the Code Charts and should therefore be replaced with the more common and more precise: @ Final consonants ------------------------------------------------------------------ U+00DF LATIN SMALL LETTER SHARP S U+1E9E LATIN CAPITAL LETTER SHARP S The issues regarding these letters have already been reported. ------------------------------------------------------------------ IPA Extensions (0250..02AF) Given that this block includes also a subheading for @ IPA characters for disordered speech it seems inappropriate to make the first subheading a mere double of the block name. (For another point, see also the comment on the Mayan numerals subheading I’d previously submitted [above].) So the first subheading may be reworded to: @ Extensions for general phonetics Further, the French localization has added an annotation between the blockheading and that subheading, about terminology used in character names. Having completed the list, I’m proposing to port it back to the original Code Charts. The proposed French text is found at: http://docucaras.info#u0250 English transposition: @+ The IPA is enhanced using diacritics—among which a stroke should be diagonal, and a bar, horizontal—and transforms, mainly: inverted = following an horizontal axis of symmetry; reversed = vertical axis; turned = rotated by 180°; sideways = by 90°; clockwise = on the right; counterclockwise = on the left (majority). As of the actual annotation, I recommend to start it with a statement like: @+ Several letters of the IPA have become part of the orthographies of many languages, some of which are cited as examples. # added IPA includes basic Latin letters and a number of Latin or Greek letters from other blocks. # original; I’d cite the blocks namedly, but that doesn’t fit the # actual scheme applied to the English (Unicode) NamesList. ------------------------------------------------------------------ U+10AC8 MANICHAEAN SIGN UD This symbol (Gc=So) has been encoded amidst the alphabet without rationale besides that the abbreviated word represented by it mainly consists of a WAW: https://www.unicode.org/L2/L2011/11123r-n4029r-manichaean.pdf Perhaps its Gc was Lo before being shifted to So, but anyhow it’s hard to figure out why a compound should be classified with the base letters here, whereas everywhere else logograms are set apart. While very careful to take into account best practices and encoding principles in current use, the Original Proposer Team failed in designing the block in consistency with other blocks where alphabets are encoded in continuous ranges, e.g. Syriac letters (0710..072C). I see no reason, however, that Unicode should perpetuate the appearance of normality conveyed by not granting the MANICHAEAN SIGN UD an appropriate and convenient subheading, regardless whether shifting normality from appearance to classification would make the disruption stand out even more. (In other words: Having the UD between WAW and ZAYIN under the “Letters” subheading has only the appearance of normality, letting inadvertant Code Charts readers believe that there was a good reason to get things this way around. As soon as subheadings are adjusted to apply correct classification like almost everywhere else in the Code Charts (some of the comments I’ve previously submitted notwithstanding), everybody gets aware that there must have been a problem, not only those looking up the Gc and then grabbing the encoding proposal from the internet.) Sorry for being very explicit; I’m really afraid that Unicode could be reluctant to give the green light to correct e.g. this way: 10AC7 MANICHAEAN LETTER WAW @ Logogram # added (alternate: Sign) 10AC8 MANICHAEAN SIGN UD * represents the conjunction ẉ̇ “and” # added @ Letters # replicated 10AC9 MANICHAEAN LETTER ZAYIN ------------------------------------------------------------------ Basic Latin; Greek and Coptic; Cyrillic: Ranges dedicated to the basic alphabet or to diacriticized letters ordered by case In the Greek and Coptic block (0370..03FF), these and subsequent letters: 0388 GREEK CAPITAL LETTER EPSILON WITH TONOS, … 0391 GREEK CAPITAL LETTER ALPHA, … 03AA GREEK CAPITAL LETTER IOTA WITH DIALYTIKA, … 03AC GREEK SMALL LETTER ALPHA WITH TONOS, … 03B1 GREEK SMALL LETTER ALPHA, … 03CA GREEK SMALL LETTER IOTA WITH DIALYTIKA, … are altogether under one single subheading: @ Letters whereas in the Basic Latin block, we have an @ Uppercase Latin alphabet subheading and a @ Lowercase Latin alphabet subheading. This inequality of treatment (intersperse punctuation and symbols in the Basic Latin block notwithstanding) needs in my opinion to be corrected. Casing scripts having ranges ordered by case do need corresponding subheadings, mentioning the case. Further, blocks of scripts using precomposed letters do need to have the basic alphabet marked up as such. However, like in the Mayan numerals block, the subheadings do not need to repeat the script name. Hence, in the Basic Latin block (0000..007F), the word “Latin” should be removed from the subheadings. Next, in the Greek block, the following subheadings should be added or adjusted accordingly: @ Uppercase letters # modified 0388 GREEK CAPITAL LETTER EPSILON WITH TONOS, … @ Uppercase alphabet # added 0391 GREEK CAPITAL LETTER ALPHA, … @ Uppercase letters # replicated 03AA GREEK CAPITAL LETTER IOTA WITH DIALYTIKA, … @ Lowercase letters # added 03AC GREEK SMALL LETTER ALPHA WITH TONOS, … @ Lowercase alphabet # added 03B1 GREEK SMALL LETTER ALPHA, … @ Lowercase letters # replicated 03CA GREEK SMALL LETTER IOTA WITH DIALYTIKA, … Then, in the Cyrillic block (0400..04FF), marking up the Russian alphabet is already done: @ Basic Russian alphabet 0410 CYRILLIC CAPITAL LETTER A But we need a subheading at the start of the lowercase alphabet, too. To achieve this, one can simply remove the word “Basic” as this is implicit. So Unicode can get two subheadings: @ Russian uppercase alphabet @ Russian lowercase alphabet Another option (imo the preferred one) is to introduce a supplemental heading level like in the Musical symbols block: @ Kievan notation @+ The following range is specific to Kievan notation. @ Clef 1D1DE MUSICAL SYMBOL KIEVAN C CLEF That can be transposed to the Cyrillic block: @ Russian @+ These ranges are dedicated to the basic Russian alphabet @ Uppercase alphabet @ Lowercase alphabet ------------------------------------------------------------------ Coptic Epact Numbers (102E0..102FF) Thinking that this block could use some annotation, I’d suggest that, given 0605 ARABIC NUMBER MARK ABOVE has already the comment line: * may be used with Coptic Epact numbers the Coptic block, that crossreferences already the Greek and Coptic block, could be granted a second annotation: @+ Coptic epact digits and numbers are coded in the Coptic Epact Numbers block. And the Coptic Epact Numbers block could be completed as follows: @@ 102E0 Coptic Epact Numbers 102FF @+ These characters, called “imported” (epact) or cursive, are an alternate representation of numbers in Coptic. # added @+ The number sign is unified with the Arabic number mark. # added (minimal) x (arabic number mark above - 0605) # added @ Sign 102E0 COPTIC EPACT THOUSANDS MARK ------------------------------------------------------------------ Thai (0E00..0E7F) It seems to me that the first subheading in the Thai block is actually an annotation: @@ 0E00 Thai 0E7F @@+ @+ Based on TIS 620-2533. # plus sign and period added @ Consonants 0E01 THAI CHARACTER KO KAI ------------------------------------------------------------------ U+0132 LATIN CAPITAL LIGATURE IJ U+0133 LATIN SMALL LIGATURE IJ I’m convinced that adding some more information here would be well done: 0132 LATIN CAPITAL LIGATURE IJ # 0049 004A 0133 LATIN SMALL LIGATURE IJ * Dutch * visible ligation may be font‐dependent # added * combining with 0301 results in both i and j bearing an acute accent # added # 0069 006A ------------------------------------------------------------------ Bopomofo (3100..312F) This block could be completed usefully with a new first subheading: @@ 3100 Bopomofo 312F @+ See also the Bopomofo Extended block. # added final period @ Letters for Mandarin # added # alternate: Mandarin letters @+ Based on GB 2312 # converted to annotation 3105 BOPOMOFO LETTER B […] @ Dialect (non-Mandarin) letters # existing 312A BOPOMOFO LETTER V ------------------------------------------------------------------ Duployan (1BC00..1BC9F) The gerund “orientating” occurs 18 times in the Code Charts, all in Duployan. First instance: 1BC47 DUPLOYAN LETTER E * character rotates to match entry angle of preceding consonant * secondary orientating (left and down) * Sloan long a * Perrault short i, long e (with dot accent) x (duployan affix attached e hook - 1BC7A) However, the encoding proposal uses “orienting” — see: http://www.unicode.org/L2/L2010/10272r2-duployan.pdf Merriam-Webster does support both “orient” and “orientate” with intended semantics. The Word‐of‐the‐Day 2017-04-30 podcast reveals that "to orientate" undergoes criticism for having one syllable more. Being the newer one of the two, it thrives in British English. Google Search retrieves 436,000 instances of "to orientate", but 3,170,000 of "to orient". So given that the Original Proposer uses 28 times "orienting", and zero times "orientating", I suspect the shift is due to Unicode being committed to British English in the Code Charts. Or to the fact that "orientate" looks more technical. Whatever, I’d suggest to do a search‐and‐replace in NamesList.txt to replace "orientating" with "orienting". ------------------------------------------------------------------ U+1BC43 DUPLOYAN LETTER OA * Pernin aw * Perrault aw could be merged to: * Pernin, Perrault: aw (adding a colon after the variant identifiers). ------------------------------------------------------------------ Domino Tiles (1F030..1F09F) The Domino tile subheadingss are almost all wrong, since dominoes are named following the least value. True subheadings would be: @ Tiles with zero dots on the left side and so on. An annotation should be added, if the actual subheadings must be maintained. ------------------------------------------------------------------ Enclosed Ideographic Supplement (1F200..1F2FF) The issue here is a terminological flaw between "squared", i.e. surrounded by a square, and "square", i.e. square‐shaped: @ Squared hiragana from ARIB STD B24 # change to: @ Square hiragana from ARIB STD B24 1F200 SQUARE HIRAGANA HOKA = and others # <square> 307B 304B @ Squared katakana 1F201 SQUARED KATAKANA KOKO = here sign # <square> 30B3 30B3 ------------------------------------------------------------------ U+1F41B BUG The sample glyph is inconsistent with the character identity according to name. It should show an animal of the order of the hemiptera, kind of a beetle. ------------------------------------------------------------------ U+1F6A0 MOUNTAIN CABLEWAY U+1F6A1 AERIAL TRAMWAY U+1F6A1 is a misnomer, and glyph fits 1F6A0, while the latter does not really exist, as a cable this way is technically unfeasible, as it is too steep for a suspension railway. With the cable underneath, and the line figuring rails, this glyph could be recycled for a funicular emoji. In consistency with actual practice, the Code Charts would be worded as follows: 1F6A0 MOUNTAIN CABLEWAY = aerial tramway * two big shuttles 1F6A1 AERIAL TRAMWAY = gondola lift * small cabins circulating continuously The glyphs are then to be adjusted accordingly. References: https://en.wikipedia.org/wiki/Aerial_tramway#Terminology http://www.iemoji.com/view/emoji/861/travel-places/mountain-cableway http://www.iemoji.com/view/emoji/862/travel-places/aerial-tramway https://en.wikipedia.org/wiki/Gondola_lift ------------------------------------------------------------------ U+1F46B MAN AND WOMAN HOLDING HANDS U+1F6BB RESTROOM Linking these emoji by crossreferencing them mutually seems displaced to me. 1F46B MAN AND WOMAN HOLDING HANDS x (restroom - 1F6BB) # remove […] 1F6BB RESTROOM = man and woman symbol with divider = unisex restroom x (man and woman holding hands - 1F46B) # remove ------------------------------------------------------------------ U+1F6EC AIRPLANE ARRIVING This emoji has a wrong glyph, as planes don’t land by heading on the ground. See airport signage for reference. ------------------------------------------------------------------ U+1680 OGHAM SPACE MARK This character has Gc=Zs, so the subhead should be Space, not Punctuation. Stem notwithstanding. ------------------------------------------------------------------ U+005E CIRCUMFLEX ACCENT U+005F LOW LINE U+0060 GRAVE ACCENT U+007E TILDE U+00A8 DIAERESIS U+00AF MACRON U+00B0 DEGREE SIGN U+00B4 ACUTE ACCENT U+00B8 CEDILLA U+2017 DOUBLE LOW LINE These 10 characters have this comment line: * this is a spacing character This should be changed to: * this is an independent character The reason is that combining marks with Gc=Mc are spacing, too. Antonyms are: • "spacing" vs "non‐spacing" • "combining" vs "independent" By contrast, "spacing" is not a synonym of "independent", nor is "non‐spacing" a synonym of "combining". ------------------------------------------------------------------ U+0950 DEVANAGARI OM U+0AD0 GUJARATI OM U+0BD0 TAMIL OM U+0F00 TIBETAN SYLLABLE OM U+A8FD DEVANAGARI JAIN OM U+111C4 SHARADA OM U+11350 GRANTHA OM U+11449 NEWA OM U+114C7 TIRHUTA OM U+118FF WARANG CITI OM Om is one of the most important spiritual symbols in Hinduism. Unicode encourages unification with DEVANAGARI OM in all scripts that don’t have a distinctive glyph of their own for this syllable. Hence 10 OM characters are found in Unicode, one of which (Sharada) is discouraged. In the Code Charts, many of the OM characters do have their own subheading, but only one instance, in Newa script, has "Invocation" in it. This state of the art is not satisfactory, so I request the following changes. Quotations always include full ranges (on a per‐subheading basis). @ Sign # discard @ Invocation sign # substitute 0950 DEVANAGARI OM x (om symbol - 1F549) @ Various signs # discard @ Invocation sign # substitute 0AD0 GUJARATI OM @ Various signs # discard @ Invocation sign # substitute 0BD0 TAMIL OM @ Length mark # added 0BD7 TAMIL AU LENGTH MARK @ Syllable # discard @ Invocation sign # substitute 0F00 TIBETAN SYLLABLE OM @ Signs # discard @ Invocation signs # substitute A8FC DEVANAGARI SIGN SIDDHAM = siddhirastu * used at the beginning of texts as an invocation x (tibetan mark initial yig mgo mdun ma - 0F04) x (mongolian birga - 1800) x (sharada sign siddham - 111DB) A8FD DEVANAGARI JAIN OM @ Various signs # no change 111C1 SHARADA SIGN AVAGRAHA 111C2 SHARADA SIGN JIHVAMULIYA 111C3 SHARADA SIGN UPADHMANIYA 111C4 SHARADA OM * use of this character is discouraged * recommended sequence is 1118F 11180 @ Sign # discard @ Invocation sign # substitute 11350 GRANTHA OM @ Invocation signs # no change 11449 NEWA OM 1144A NEWA SIDDHI @ Various signs 114BF TIRHUTA SIGN CANDRABINDU 114C0 TIRHUTA SIGN ANUSVARA 114C1 TIRHUTA SIGN VISARGA 114C2 TIRHUTA SIGN VIRAMA = halant 114C3 TIRHUTA SIGN NUKTA 114C4 TIRHUTA SIGN AVAGRAHA 114C5 TIRHUTA GVANG = vedic anusvara 114C6 TIRHUTA ABBREVIATION SIGN @ Invocation sign # added 114C7 TIRHUTA OM @ Sign # discard @ Invocation sign # substitute 118FF WARANG CITI OM ------------------------------------------------------------------ Box Drawing (2500..257F) There are two ranges of dashed lines, both of which are under a @ Light and heavy dashed lines subheading. Suggestion: more distinctive subheadings: @ Triple and quadruple dashed lines 2504 BOX DRAWINGS LIGHT TRIPLE DASH HORIZONTAL … @ Double dashed lines 254C BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL … ------------------------------------------------------------------ Latin Extended-B (0180..024F) The first subheading: @ Non-European and historic Latin is no longer accurate since this range includes: 01B7 LATIN CAPITAL LETTER EZH that is used in: * African, Skolt Sami hence in Europe, too. Anyway, classifying letters as “non‐European” is europocentric. I’d suggest to derive this subheading from the one found below, before U+021C LATIN CAPITAL LETTER YOGH: @ Miscellaneous additions by replacing "additions" with "letters" at blockstart: @ Miscellaneous letters 0180 LATIN SMALL LETTER B WITH STROKE … Further, the subheading @ Phonetic and historic letters found before U+01DD LATIN SMALL LETTER TURNED E is plain wrong, as several letters of this range, including the first one, are used in writing systems of living languages. It’s probably safe to replicate the generic @ Miscellaneous additions subheading. ------------------------------------------------------------------ Latin Extended-C (2C60..2C7F) The first subheading of this block: @ Orthographic Latin additions does not make that much sense, since "Latin" is induced from the block name, and "Orthographic" is somehow obvious. Using generic subheadings like the abovementioned @ Miscellaneous letters seems to be safer than figuring out something special that isn’t really. ------------------------------------------------------------------ Subheadings starting with "Addition[|s|al]" @ Additions for Slovenian and Croatian 0200 LATIN CAPITAL LETTER A WITH DOUBLE GRAVE … @ Additions for Romanian 0218 LATIN CAPITAL LETTER S WITH COMMA BELOW … @ Additions for Livonian 022A LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON … @ Additions for Uighur 2C67 LATIN CAPITAL LETTER H WITH DESCENDER The Code Charts contain 86 subheadings starting with the substring "Addition": 6 times "Addition", 26 times "Additions", and 54 times "Additional". The advantage is to show that the repertoire was built step by step. The downside is a constant reminder that those languages were supported only from a later stage on. That results in unfounded discrimination that could easily be avoided by simply labelling the ranges by what they contain, i.e. mostly letters. That is a big change, so I’m waiting to know whether the EC is ready. Right now I’m late with submitting these items, so there is no comprehensive list of subheadings to change. ------------------------------------------------------------------
Date/Time: Wed Apr 25 14:37:42 CDT 2018
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: PRI 372 NamedSequencesProv.txt
I’m sorry for several typos having occurred in making up the Named Sequences list submitted on Tue Apr 24 02:12:15 CDT 2018: LATIN CAPITAL LETTER O WITH TILDE AND GRAVE ACCENT;00F5 0300 should be: LATIN CAPITAL LETTER O WITH TILDE AND GRAVE ACCENT;00D5 0300 LATIN CAPITAL LETTER UPSILONK WITH GRAVE ACCENT;01B1 0300 should be: LATIN CAPITAL LETTER UPSILON WITH GRAVE ACCENT;01B1 0300 By this occasion I’d suggest to change the proposed subheading from: # Additions for languages in Togo. to # Latin sequences for languages in Togo. Thanks, Marcel
Date/Time: Wed Apr 25 17:09:46 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #372: Denoting ranges in the Code Charts
In the Unicode Standard, including the Core Specification, ranges of code points are denoted using two dots. It is therefore desirable to align the notational conventions used in the Code Charts. In the Code Charts, ranges are noted using three different conventions: 1) two dots (U+002E U+002E) 2) hyphen-minus (U+002D) 3) hyphen-minus surrounded by spaces (U+0020 U+002D U+0020) One is therefore to correct all instances showing either (2) or (3). For convenience, instances containing ranges are listed below. Care has been taken to always include a line containing code point(s), be it a block header, or a name line. Disclaimer: These instances have been retrieved using regexes. Search pattern were (1) | (2) | (3) | U+2013, surrounded by 4 hex digits on either side. Standards references have been discarded. This leaves the risk that unconventionally noted ranges have been overlooked. ====================================================== @@ 13A0 Cherokee 13FF @+ Most lowercase Cherokee syllables are encoded in the Cherokee Supplement block at AB70..ABBF. @@ 1950 Tai Le 197F @+ Note the similarly named but distinct New Tai Lue script encoded at 1980..19DF. @@ 1980 New Tai Lue 19DF @+ Note the similarly named but distinct Tai Le script encoded at 1950..197F. The New Tai Lue script is also known as Xishuangbanna Dai. ====================================================== 0020 SPACE * sometimes considered a control code * other space characters: 2000-200A 003D EQUALS SIGN * other related characters: 2241-2263 005B LEFT SQUARE BRACKET = opening square bracket (1.0) * other bracket characters: 27E6-27EB, 2983-2998, 3008-301B 00A4 CURRENCY SIGN * other currency symbol characters: 20A0-20BF 00B2 SUPERSCRIPT TWO = squared * other superscript digit characters: 2070-2079 00B8 CEDILLA * this is a spacing character * other spacing accent characters: 02D8-02DB @ Vulgar fractions @+ The fraction bar for these may be rendered horizontally or at a slant. For other fraction characters, see 2150-215E. 00BC VULGAR FRACTION ONE QUARTER @ Arabic-Indic digits @+ These digits are used with Arabic proper; for languages of Iran, Afghanistan, Pakistan, and India, see the Eastern Arabic-Indic digits at 06F0-06F9. 0660 ARABIC-INDIC DIGIT ZERO @ Astrological digits @+ These digits, also known as Sinhala Lith Illakkam, have been used primarily for writing horoscopes. This number system has a zero place holder concept, unlike the Sinhala archaic numbers, Sinhala Illakkam, encoded in the range 111E1-111F4. 0DE6 SINHALA LITH DIGIT ZERO @ Punctuation @+ Additional birgas are encoded in the Mongolian Supplement block at 11660-1167F. 1800 MONGOLIAN BIRGA @ Angles @+ Other angle symbols are found at 299B-29AF. 221F RIGHT ANGLE @ Zodiacal symbols @+ See also Asian zodiacal symbols among the animal symbols in the range 1F400-1F418. 2648 ARIES @ Cantillation marks (svara) for the Samaveda @+ See the similar set of Grantha svara markers for the Samaveda, encoded in the range 11366-11374. A8E0 COMBINING DEVANAGARI DIGIT ZERO @@ FB50 Arabic Presentation Forms-A FDFF @+ Preferred characters are found in the Arabic block 0600-06FF. This block also contains 32 noncharacters in the range FDD0-FDEF. @ Fullwidth ASCII variants @+ See ASCII 0020-007E FF01 FULLWIDTH EXCLAMATION MARK @@ 111E0 Sinhala Archaic Numbers 111FF @+ This number system is also known as Sinhala Illakkam. This number system does not have a zero place holder concept, unlike the Sinhala astrological numbers, Sinhala Lith Illakkam, encoded in the range 0DE6-0DEF. @ Cantillation marks (svara) for the Samaveda @+ See the similar set of Devanagari svara markers for the Samaveda, encoded in the range A8E0-A8F1. 11366 COMBINING GRANTHA DIGIT ZERO @ Circled sans-serif digits @+ These digits complement the sans-serif digit sets in the Dingbat block ranges 2780-2789 and 278A-2793. 1F10B DINGBAT CIRCLED SANS-SERIF DIGIT ZERO @ White circles @+ Adjective refers to the thickness of the ring. @+ Constitute a set as follows: 25CB, 2B58, 1F785-1F789 1F785 MEDIUM BOLD WHITE CIRCLE @ White squares @+ Constitute a set as follows: 25A1, 1F78E-1F793 1F78E LIGHT WHITE SQUARE @ Six pointed stars @+ Constitute a set as follows: 2736, 1F7CB-1F7CD 1F7CB MEDIUM SIX POINTED BLACK STAR @ Eight pointed stars @+ Constitute a set as follows: 2735, 1F7CE-1F7D1 1F7CE MEDIUM EIGHT POINTED BLACK STAR ====================================================== @@ FE70 Arabic Presentation Forms-B FEFF @+ Preferred characters are found in the Arabic block 0600 - 06FF. Some of these characters are used for Arabic mathematics where contextual shape variations are important semantically. @ Halfwidth CJK punctuation @+ See CJK punctuation 3000 - 303F FF61 HALFWIDTH IDEOGRAPHIC FULL STOP @ Halfwidth Katakana variants @+ See Katakana 30A0 - 30FF FF65 HALFWIDTH KATAKANA MIDDLE DOT @ Halfwidth Hangul variants @+ See Hangul Compatibility Jamo 3130 - 318F FFA0 HALFWIDTH HANGUL FILLER @ Fullwidth symbol variants @+ See Latin-1 00A0 - 00FF FFE0 FULLWIDTH CENT SIGN ======================================================
Date/Time: Thu Apr 26 12:45 CDT 2018
Name: Marcel Schneider
Report Type: Public Review Issue
Opt Subject: PRI #372 (consolidated feedback, remainder)
U+1039 MYANMAR SIGN VIRAMA U+103A MYANMAR SIGN ASAT About 60 % of the viramas have a dedicated subheading: @ Virama It would be desirable that this feature be extended to the remaining 40 %. Multiple issues concerning the Myanmar virama make this a handy example. The Myanmar viramas are encoded between a range of dependent vowels and a range of dependent consonants. (Specifically about these, please see another feedback item above pertaining to the way of declaring dependent vowels and consonants.) I see a potential to enhance presentation by adding an appropriate subheading. Comment lines seem also to need a revision: U+1039 appears in the Code Chart as never rendered visibly. That contradicts the actual annotation, which is suspected to have to be changed to “shape shown is arbitrary and is not visibly rendered” as found at U+17D2, U+2D7F, and U+10A3F. @ Various signs 1036 MYANMAR SIGN ANUSVARA 1037 MYANMAR SIGN DOT BELOW = aukmyit * a tone mark 1038 MYANMAR SIGN VISARGA @ Viramas # added 1039 MYANMAR SIGN VIRAMA = killer (when rendered visibly) # discard * shape shown is arbitrary and is not visibly rendered # replicated 103A MYANMAR SIGN ASAT = killer (always rendered visibly) # discard * always rendered visibly # converted ------------------------------------------------------------------ U+07F7 NKO SYMBOL GBAKURUNEN This has Gc=So and is nevertheless part of a range under the subhead “Punctuation.” Indeed it is reported to terminate important sections. Therefore it should have been given the Gc=Po, like the famous Siddham section marks, that are all Gc=Po. Unicode may wish to either change category of U+07F7 from Gc=So to Gc=Po, or to move U+07F7 one line up, so it gets under the preceding “Symbols” subheading: @ Symbols # changed to plural 07F6 NKO SYMBOL OO DENNEN 07F7 NKO SYMBOL GBAKURUNEN # raised by one line @ Punctuation 07F8 NKO COMMA 07F9 NKO EXCLAMATION MARK It appears further useful to give some hint about the meaning of each one of the two symbols, as usual in the Code Charts (cf. other instances of particular symbols and logograms). According to the encoding proposal: http://www.unicode.org/L2/L2004/04172-n2765-nko.pdf page 6, Unicode could add the following comment lines: 07F6 NKO SYMBOL OO DENNEN * remote future placement of the topic # added 07F7 NKO SYMBOL GBAKURUNEN * end of major section # added * time to prepare and have meal # added (optional) ------------------------------------------------------------------ U+27C5 LEFT S-SHAPED BAG DELIMITER U+27C6 RIGHT S-SHAPED BAG DELIMITER U+27CB MATHEMATICAL RISING DIAGONAL U+27CD MATHEMATICAL FALLING DIAGONAL The bag delimiters are Gc=Ps and Gc=Pe, and do need a subheading, the more as nearly every single symbol around here has been given its own, while all Gc=Sm. Also, the mathematical diagonals constitute a singleton range each one, yet have generic subheadings. These are advantageously replaced with specific ones. That would contribute to get them visually associated, given they are separated by another character, like most of the paired ASCII punctuations. Abridged snippet: … @@ 27C0 Miscellaneous Mathematical Symbols-A 27EF @ Miscellaneous symbols 27C0 THREE DIMENSIONAL ANGLE 27C1 WHITE TRIANGLE CONTAINING SMALL WHITE TRIANGLE 27C2 PERPENDICULAR 27C3 OPEN SUBSET 27C4 OPEN SUPERSET @ Paired punctuations # added (note the plural) 27C5 LEFT S-SHAPED BAG DELIMITER 27C6 RIGHT S-SHAPED BAG DELIMITER @ Miscellaneous symbols # replicated from blockstart 27C7 OR WITH DOT INSIDE 27C8 REVERSE SOLIDUS PRECEDING SUBSET 27C9 SUPERSET PRECEDING SOLIDUS @ Vertical line operator 27CA VERTICAL BAR WITH HORIZONTAL STROKE @ Miscellaneous symbol # discard @ Mathematical diagonal # changed 27CB MATHEMATICAL RISING DIAGONAL @ Division operator 27CC LONG DIVISION @ Miscellaneous symbol # discard @ Mathematical diagonal # changed 27CD MATHEMATICAL FALLING DIAGONAL @ Operators 27CE SQUARED LOGICAL AND 27CF SQUARED LOGICAL OR @ Miscellaneous symbol 27D0 WHITE DIAMOND WITH CENTRED DOT @ Operators ------------------------------------------------------------------ Mathematical Operators (2200..22FF) In this block, one subheading appears to be displaced, and two seem to be missing. Let’s look at these abridged ranges and see how to fix that: @ Operators 22D2 DOUBLE INTERSECTION 22D3 DOUBLE UNION @ Relations # remove from here 22D4 PITCHFORK # this should be part of Operators range = proper intersection @ Arithmetic relations # moved and reworded 22D5 EQUAL AND PARALLEL TO 22D6 LESS-THAN WITH DOT 22D7 GREATER-THAN WITH DOT 22D8 VERY MUCH LESS-THAN 22D9 VERY MUCH GREATER-THAN 22DA LESS-THAN EQUAL TO OR GREATER-THAN 22DB GREATER-THAN EQUAL TO OR LESS-THAN 22DC EQUAL TO OR LESS-THAN 22DD EQUAL TO OR GREATER-THAN 22DE EQUAL TO OR PRECEDES 22DF EQUAL TO OR SUCCEEDS 22E0 DOES NOT PRECEDE OR EQUAL 22E1 DOES NOT SUCCEED OR EQUAL 22E2 NOT SQUARE IMAGE OF OR EQUAL TO 22E3 NOT SQUARE ORIGINAL OF OR EQUAL TO 22E4 SQUARE IMAGE OF OR NOT EQUAL TO 22E5 SQUARE ORIGINAL OF OR NOT EQUAL TO 22E6 LESS-THAN BUT NOT EQUIVALENT TO 22E7 GREATER-THAN BUT NOT EQUIVALENT TO 22E8 PRECEDES BUT NOT EQUIVALENT TO 22E9 SUCCEEDS BUT NOT EQUIVALENT TO 22EA NOT NORMAL SUBGROUP OF 22EB DOES NOT CONTAIN AS NORMAL SUBGROUP 22EC NOT NORMAL SUBGROUP OF OR EQUAL TO 22ED DOES NOT CONTAIN AS NORMAL SUBGROUP OR EQUAL @ Ellipses # added @+ These four ellipses are used for matrix row/column elision. # converted from comment line below 22EE VERTICAL ELLIPSIS * these four ellipses are used for matrix row/column elision # discard from this place 22EF MIDLINE HORIZONTAL ELLIPSIS 22F0 UP RIGHT DIAGONAL ELLIPSIS 22F1 DOWN RIGHT DIAGONAL ELLIPSIS @ Set relations # replicated and reworded 22F2 ELEMENT OF WITH LONG HORIZONTAL STROKE 22F3 ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE 22F4 SMALL ELEMENT OF WITH VERTICAL BAR AT END OF HORIZONTAL STROKE 22F5 ELEMENT OF WITH DOT ABOVE 22F6 ELEMENT OF WITH OVERBAR 22F7 SMALL ELEMENT OF WITH OVERBAR 22F8 ELEMENT OF WITH UNDERBAR 22F9 ELEMENT OF WITH TWO HORIZONTAL STROKES 22FA CONTAINS WITH LONG HORIZONTAL STROKE 22FB CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE 22FC SMALL CONTAINS WITH VERTICAL BAR AT END OF HORIZONTAL STROKE 22FD CONTAINS WITH OVERBAR 22FE SMALL CONTAINS WITH OVERBAR 22FF Z NOTATION BAG MEMBERSHIP @~ Standardized Variation Sequences ------------------------------------------------------------------ U+2A53 DOUBLE LOGICAL AND U+2A54 DOUBLE LOGICAL OR One word is missing in these names: NESTED, as in: U+2AA1 DOUBLE NESTED LESS-THAN U+2AA2 DOUBLE NESTED GREATER-THAN U+2AA3 DOUBLE NESTED LESS-THAN WITH UNDERBAR That brings the need to add either aliases (recommended) or comment lines: 2A53 DOUBLE LOGICAL AND = double nested logical and 2A54 DOUBLE LOGICAL OR = double nested logical or 2A53 DOUBLE LOGICAL AND * nested 2A54 DOUBLE LOGICAL OR * nested ------------------------------------------------------------------ U+1426 CANADIAN SYLLABICS FINAL DOUBLE SHORT VERTICAL STROKES This character name is misspelt as of the final S in STROKES. An annotation should therefore be added to prevent inadvertent translators from applying plural. Suggestions: 1426 CANADIAN SYLLABICS FINAL DOUBLE SHORT VERTICAL STROKES * one stroke # added (option 1) * one double stroke # added (option 2) * actually one stroke # added (option 3) * actually one double stroke # added (option 4) ------------------------------------------------------------------ U+26A2 DOUBLED FEMALE SIGN U+26A3 DOUBLED MALE SIGN The informative aliases of these symbols should be choosen from the same terminological pool. I.e. when one alias is constructed with generic terminology, the other must be, too. Here, the specific vocabulary used to choose the “lesbianism” alias was unavailable when looking for its male counterpart, since “gayism” is still uncommon; see: https://forum.wordreference.com/threads/is-gayism-a-word-or-not.3067518/ Further, conventional ordering of alias and comment lines should be applied. @ Gender symbols 26A2 DOUBLED FEMALE SIGN = lesbianism # discard = female homosexuality # derived from below x (two women holding hands - 1F46D) 26A3 DOUBLED MALE SIGN = male homosexuality # raised * a glyph variant has the two circles on the same line x (two men holding hands - 1F46C) 26A4 INTERLOCKED FEMALE AND MALE SIGN = bisexuality # raised * a glyph variant has the two circles on the same line ------------------------------------------------------------------ U+2A70 APPROXIMATELY EQUAL OR EQUAL TO Surprisingly this symbol is not composed with an APPROXIMATELY EQUAL sign, but with an ALMOST EQUAL sign. Hence I’d suggest adding an alias and an xref: 2A6F ALMOST EQUAL TO WITH CIRCUMFLEX ACCENT 2A70 APPROXIMATELY EQUAL OR EQUAL TO = almost equal to above equals sign x (approximately equal to - 2245) 2A71 EQUALS SIGN ABOVE PLUS SIGN ------------------------------------------------------------------ U+2044 FRACTION SLASH As a mathematical symbol amidst punctuations (U+2044 has Gc=Sm), this character needs a subheading. Also, the next two could use a subheading, too. 2043 HYPHEN BULLET x (hyphen-minus - 002D) @ Mathematical symbol # added 2044 FRACTION SLASH = solidus (in typography) * for composing arbitrary fractions x (solidus - 002F) x (division slash - 2215) @ Paired punctuation # added 2045 LEFT SQUARE BRACKET WITH QUILL 2046 RIGHT SQUARE BRACKET WITH QUILL @ Double punctuation for vertical text ------------------------------------------------------------------ U+276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT U+276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT These are Gc=Ps and Gc=Pe respectively. That is wrong. Change requests: 1) Change to Gc=Pi and Gc=Pf, respectively. 2) Remove from BidiBrackets.txt (and change related property values). 3) Add appropriate subheadings in the Code Charts: @ Ornamental brackets 2768 MEDIUM LEFT PARENTHESIS ORNAMENT x (left parenthesis - 0028) 2769 MEDIUM RIGHT PARENTHESIS ORNAMENT x (right parenthesis - 0029) 276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT 276B MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT 276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT x (left-pointing angle bracket - 2329) 276D MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT x (right-pointing angle bracket - 232A) @ Ornamental quotation marks # added 276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT x (single left-pointing angle quotation mark - 2039) 276F HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT x (single right-pointing angle quotation mark - 203A) @ Ornamental brackets # replicated 2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT 2771 HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT 2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT x (left tortoise shell bracket - 3014) 2773 LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT x (right tortoise shell bracket - 3015) 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT x (left curly bracket - 007B) 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT x (right curly bracket - 007D) @ Dingbat circled digits ------------------------------------------------------------------ U+2E08 DOTTED TRANSPOSITION MARKER I wonder whether this should not be Bidi_Mirrored=Yes, by RTL glyph, given U+2E09 LEFT TRANSPOSITION BRACKET and U+2E0A RIGHT TRANSPOSITION BRACKET are so by glyph exchange. (We note that the dotted version U+2E08 occurs unpaired, because the single word it pertains to is moved as specified.) ------------------------------------------------------------------ Latvian letters for pre-1921 orthography I wonder whether this subheading of range U+A7A0..U+A7A9 should not be “Letters for Latvian pre-1921 orthography” ------------------------------------------------------------------ CJK Unified Ideographs Extension blocks The names of these blocks need a hyphen before the numbering: @@ 3400 CJK Unified Ideographs Extension-A 4DB5 @@ 20000 CJK Unified Ideographs Extension-B 2A6D6 @@ 2A700 CJK Unified Ideographs Extension-C 2B734 @@ 2B740 CJK Unified Ideographs Extension-D 2B81D @@ 2B820 CJK Unified Ideographs Extension-E 2CEA1 @@ 2CEB0 CJK Unified Ideographs Extension-F 2EBE0 I doubt whether block names stability would allow these corrections, though. Anyhow, I’m interested in hints about why the hyphen rule was applied to blocks like Latin Extended, but not to CJK Extensions. Also it would be good to know if there is a real mistake or not, and if localized versions of the Code Charts should apply the hyphen rule throughout, or like in the English version, or not at all. ------------------------------------------------------------------ U+29A6 OBLIQUE ANGLE OPENING UP U+29A7 OBLIQUE ANGLE OPENING DOWN Should be OBTUSE ANGLE. So we need informative aliases: 29A6 OBLIQUE ANGLE OPENING UP = obtuse angle opening up 29A7 OBLIQUE ANGLE OPENING DOWN = obtuse angle opening down
Date/Time: Sun Apr 29 19:37:14 CDT 2018
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: Unicode 11: Wrong Numeric Value of U+1ECA1
U+1ECA1 INDIC SIYAQ NUMBER KAROR was accidentally given the numeric value 1000000 (one million) instead of the correct 10000000 (ten million). The value for the related character U+1ECA2 INDIC SIYAQ NUMBER KARORAN is accurate, however.