This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Wed Apr 24 12:13:30 CDT 2024
ReportID: ID20240424121330
Name: Ned Holbrook
Report Type: Error Report
Opt Subject: Unicode 16.0 Core Spec
[EDC]
Table 12-38 is missing a couple of space characters, namely in “0D310D31” and “0D2A0D31”. I would also note in passing that it is somewhat jarring to note just how many ways there are of formatting sequences in this chapter: Table 12-32 lists code points separated by commas in angle brackets, Table 12-35 lists code points separated by commas with no angle brackets, Table 12-37 lists code points interspersed with descriptions, Tables 12-38 and 12-39 list code points separated by spaces, and Table 12-40 has parallel lists of descriptions and code points separated by commas. While I would not assume a single format is best for every purpose, it does seem that there could be more consistency in this chapter at least.
Date/Time: Sat May 04 17:43:05 CDT 2024
ReportID: ID20240504174305
Name: Alexander Kunde
Report Type: Error Report
Opt Subject: Bamum Supplement Block
[SEW]
this concerns seemingly faulty character names in the Bamum Supplement Block. Presuming that the phonetic readings (that served as the basis for the character names) as given in the underlying proposal (N3597, L2/09-102 with N3523, L2/09-106) are correct and following the conventions specified therein (on p. 3), there are seemingly typos in the following character names: 1680B "MAEMBGBIEE" for MAEMGBIEE (məmgbie) 16881 "PUNGAAM" for PUNGGAAM (puŋgaam) 1688E "NGOM" for NGGOM (ŋgɔm) 168DC "SETFON" for SHETFON (ʃɛtfɔn) 16963 "MBAA SEVEN" for SAMBA (samba) 1697D "NGOP" for NGGOP (ŋgɔp). For two further characters, 16839 FIRI ("firʼi") and 16A24 NI ("nʼi"), the phonetic source form contains an apostrophe, for which however no conversion is indicated. Might those not be either, resp., FIR-I and N-I, or, if the apostrophe is a variant for ʔ, FIRQI and NQI? Note that, writing from Germany, I myself can't actually read the script nor speak the underlying language and have no connections to the user community. I merely noticed discrepancies between the columns (phonetic vs. en vs. fr) in the indicated proposal (pp.21 ff.).
Date/Time: Tue May 07 18:02:20 CDT 2024
ReportID: ID20240507180220
Name: Markus Scherer
Report Type: Error Report
Opt Subject: TUS table 4-5 Primary Numeric Ideographs
[EDC]
Eric Muller noticed that TUS table 4-5 shows U+5146 with the value 1,000,000,000,000 (10,000 × 10,000 × 10,000) which since Unicode 15.1 is no longer the Numeric_Value of that code point. See https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/#G138783 https://www.unicode.org/Public/15.0.0/ucd/extracted/DerivedNumericValues.txt https://www.unicode.org/Public/15.1.0/ucd/extracted/DerivedNumericValues.txt The kPrimaryNumeric value is 1000000 1000000000000 (with two values separated by a space). The first one of these is the Numeric_Value. Also, kPrimaryNumeric has data for 20 code points, while table 4-5 shows only 17.
Date/Time: Wed May 08 07:23:49 CDT 2024
ReportID: ID20240508072349
Name: Mikhail Merkuryev
Report Type: Error Report
Opt Subject: Supplemental Arrows-C
[SEW]
Change egyptologic arrows 1F8C0 and C1 from hollow to simple. As L2/23-185 says, the arrows don’t need to be hollow.
Date/Time: Wed May 15 10:35:38 CDT 2024
ReportID: ID20240515103538
Name: Mikhail Merkuryev
Report Type: Public Review Issue
Opt Subject: DoNotEmit.txt [PAG]
DoNotEmit.txt: Add to “Discouraged” or “Preferred spelling” decomposition of those Cyrillic letter known by me: Ёё Йй Ўў (Cyrillic capital/small letter Io, Cyrillic capital/small letter Short I, Cyrillic capital/small letter Short U) e.g. 0415 0308 → 0401 # Cyrillic capital letter Ie + combining diaeresis → Cyrillic capital letter Io Maybe others, but I don’t know. Її (Cyrillic capital/small letter Yi) is tricky and IDK what to do: discouraged in decomposed form in modern Ukrainian text, but maybe allowed in Old Slavonic. Rationale: most Cyrillic fonts do not lay combining marks properly, and common breve has other shape different from Cyrillic. And these four letters in modern shape are really distinct entities.
Date/Time: Mon May 20 12:39:57 CDT 2024
ReportID: ID20240520123957
Name: Ben Scarborough
Report Type: Public Review Issue
Opt Subject: 502 [PAG]
Note: This report duplicates report #ID20240112220043 filed against PRI #489, and will be handled there.
DoNotEmit.txt currently includes the following line: 0149; 02BC 006E; Deprecated # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE; MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER N The character in question, U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE, has had the Deprecated property since Unicode 5.2.0. According to L2/08-287, the character was deprecated because its decomposition used the wrong apostrophe character: RIGHT SINGLE QUOTATION MARK is the preferred character for Afrikaans, not MODIFIER LETTER APOSTROPHE. The line in DoNotEmit.txt should use the preferred string instead of U+0149's compatibility decomposition. The line should be changed to: 0149; 2019 006E; Deprecated # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE; RIGHT SINGLE QUOTATION MARK, LATIN SMALL LETTER N
Date/Time: Tue May 21 18:27:20 CDT 2024
ReportID: ID20240521182720
Name: Erik Carvalhal Miller
Report Type: Public Review Issue
Opt Subject: 502 [EDC]
Note: This has been fixed in a subsequent draft of the core spec.
Chapter 22, §22.7.4 [https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-22/#G78435], ¶5 (“A set of ASCII digits 0 through 9…”): “ASCI” → “ASCII”, “charcter” → “character” (in “Outlined uppercase Latin letters and ASCI digits from the European charcter set for the Sharp MZ-series machines…”
Date/Time: Wed May 22 04:02:17 CDT 2024
ReportID: ID20240522040217
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 502 [PAG]
I propose adding Duployan (Dupl) to the Script_Extensions for the following code points based on annotations in the names list for the Duployan block, the contents of UTN #37, “Duployan Shorthand”, and the original encoding proposal for Duployan, L2/10-272r2: U+00B7 MIDDLE DOT U+0300 COMBINING GRAVE ACCENT U+0301 COMBINING ACUTE ACCENT U+0304 COMBINING MACRON U+0306 COMBINING BREVE U+0307 COMBINING DOT ABOVE U+0308 COMBINING DIAERESIS U+030A COMBINING RING ABOVE U+0323 COMBINING DOT BELOW U+0324 COMBINING DIAERESIS BELOW U+0331 COMBINING MACRON BELOW U+2E3C STENOGRAPHIC FULL STOP Duployan for Romanian also makes use of U+00B0 DEGREE SIGN in numerical contexts, though as this character is in common use in a variety of writing systems and has no explicit Script_Extensions as of now there would likely be little benefit to specifically listing just Duployan.
Date/Time: Thu May 23 09:34:28 CDT 2024
ReportID: ID20240523093428
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 502 [PAG]
DoNotEmit.txt currently includes the following line: 13217; 13216 13430 13216 13430 13216; Precomposed_Hieroglyph # EGYPTIAN HIEROGLYPH N035A; EGYPTIAN HIEROGLYPH N035, EGYPTIAN HIEROGLYPH VERTICAL JOINER, EGYPTIAN HIEROGLYPH N035, EGYPTIAN HIEROGLYPH VERTICAL JOINER, EGYPTIAN HIEROGLYPH N035 However, section 11.4.3 of the core spec specifically states: »For example, U+13217 𓈗 EGYPTIAN HIEROGLYPH N035A apparently could be represented by the sequence <13216, 13430, 13216, 13430, 13216>. However, this compound sign is considered a single entity in Ancient Egyptian by Egyptologists, because the compound sign conveys a function that is not covered by the meaning of its individual parts. As a result, the atomic character U+13217 should be used.« I do not know which representation is actually the preferred one, so either this DoNotEmit entry or this section of the core spec should be removed.
Date/Time: Thu May 23 09:57:09 CDT 2024
ReportID: ID20240523095709
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 502 [PAG]
The two new CJK strokes, U+31E4 CJK STROKE HXG and U+31E5 CJK STROKE SZP, currently have no explicit Script_Extensions. They should be given the property value “Hani” like all the other CJK strokes (U+31C0..U+31E3).
Date/Time: Sat Jun 01 11:53:22 CDT 2024
ReportID: ID20240601115322
Name: Sridatta A
Report Type: Public Review Issue
Opt Subject: 502 [EDC]
Updating the Tirhuta chapter of Core Specification. “ and in the Narayani and Janakpur zones of Nepal. ” Nepal currently doesn’t use Zones for administrative divisions since 2015. According to the current classification, Maithili is majorly spoken in Madhesh and Koshi provinces. https://en.m.wikipedia.org/wiki/Maithili_language
Date/Time: Thu Jun 06 17:22:34 CDT 2024
ReportID: ID20240606172234
Name: Debbie Anderson
Report Type: Public Review Issue
Opt Subject: 502 [SEW]
I checked with the Egyptologists and they confirmed the currently commented out Standardized Variants should remain commented out, but one additional sequence should ALSO be commented out: 1333B FE00; rotated 90 degrees; # EGYPTIAN HIEROGLYPH U007
Date/Time: Sat Jun 08 19:25:17 CDT 2024
ReportID: ID20240608192517
Name: Jules Bertholet
Report Type: Public Review Issue
Opt Subject: 502 [EDC]
From §5.8.2 of the core spec <https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G21129>: > This is a paragraph with a line separator at this point, > > causing the word “causing” to appear on a different line, but not causing the typical > paragraph indentation, sentence breaking, line spacing, or change in flush (right, center, > or left paragraphs). However, the paragraph in question actually uses a paragraph separator, not a line separator. `</p>` should be replaced with `</br>` in the HTML.
Date/Time: Wed Jun 19 07:31:03 CDT 2024
ReportID: ID20240619073103
Name: Vaishnavi Murthy Yerkadithaya
Report Type: Public Review Issue
Opt Subject: 502 [EDC/Charts]
Editorial Note: Please refer to https://www.unicode.org/cgi-bin/GetDocumentLink?L2/24-149 for detailed comments on https://www.unicode.org/charts/PDF/Unicode-16.0/U160-11380.pdf
Date/Time: Wed Jun 19 09:19:47 CDT 2024
ReportID: ID20240619091947
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 502 [PAG]
Currently, U+19DA NEW TAI LUE THAM DIGIT ONE has Line_Break=Complex_Context while all the other digit characters of the New Tai Lue script (U+19D0..U+19D9) have Line_Break=Numeric. For consistency, I propose changing U+19DA to Line_Break=Numeric as well.
Date/Time: Mon Jun 24 14:43:20 CDT 2024
ReportID: ID20240624144320
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: 502 [SEW]
Unicode 16.0 will have 5 characters with Indic syllabic category Consonant_Preceding_Repha. Such characters represent non-spacing marks, but are encoded in phonetic order before the consonant on top of which they’re rendered, and therefore have general category Lo. The representative glyphs for such characters in the code charts and, where shown, in the core specification have an enclosing dashed box to reflect their unusual properties. There’s an inconsistency in what’s shown inside that box: Most representative glyphs show the repha glyph by itself, but the one for Tulu-Tigalari shows the repha glyph on top of a dotted circle. I think showing the repha mark on top of a dotted circle actually makes sense. Affected characters: 0D4E ; Consonant_Preceding_Repha # Lo MALAYALAM LETTER DOT REPH 113D1 ; Consonant_Preceding_Repha # Lo TULU-TIGALARI REPHA 11941 ; Consonant_Preceding_Repha # Lo DIVES AKURU INITIAL RA 11D46 ; Consonant_Preceding_Repha # Lo MASARAM GONDI REPHA 11F02 ; Consonant_Preceding_Repha # Lo KAWI SIGN REPHA Sources: https://www.unicode.org/Public/draft/UCD/charts/CodeCharts.pdf https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-17/#G41865
Date/Time: Tue Jun 25 04:59:29 CDT 2024
ReportID: ID20240625045929
Name: Richard Ishida
Report Type: Public Review Issue
Opt Subject: 502 [PAG]
The Do Not Emit data file contains the following lines. --- ب ٔ; ࢡ; Hamza_Form # ARABIC LETTER BEH, ARABIC HAMZA ABOVE; ARABIC LETTER BEH WITH HAMZA ABOVE ح ٔ; ځ; Hamza_Form # ARABIC LETTER HAH, ARABIC HAMZA ABOVE; ARABIC LETTER HAH WITH HAMZA ABOVE ر ٔ; ݬ; Hamza_Form # ARABIC LETTER REH, ARABIC HAMZA ABOVE; ARABIC LETTER REH WITH HAMZA ABOVE --- These mappings are valid for orthographies that use the atomic character as a letter of the alphabet, but they are not appropriate for Kashmiri, which uses the hamza as a vowel diacritic, not as an ijam. See https://r12a.github.io/scripts/arab/ks.html#non_canonical https://r12a.github.io/scripts/arab/homographs.html#nehomographs Although the hamza is not a tashkil, the distinction made here follows the logic in the standard related to ijam vs tashkil usage. See https://r12a.github.io/scripts/arab/homographs.html#ijam_tashkil Having special rules for just a few, arbitrary combinations of hamza and base in Kashmiri is likely not only to lead to inconsistency in encoding, leading to failures in searching and other operations, but it is also a recipe for confusion for users. Note that all other uses of the vowel hamza above a base character in Kashmiri have no corresponding ijam (and if there's a possibility that atomic characters for these pairings may be created for other languages in the future this adds further complexity). It seems to me that one solution to this would be to add some sort of qualification, by language, for these entries. Or perhaps it would be helpful to make these combinations canonically equivalent and remove them from Do Not Emit. Users would then be able to type the items either way, and end up with compatible text.
Date/Time: Wed Jun 26 12:31:03 CDT 2024
ReportID: ID20240626123103
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 502 [EDC]
Note: This has been fixed in a subsequent draft of the names list for 16.0.
I propose adding cross references between U+2BFA ⯺ UNITED SYMBOL and U+1CC88 TWO RINGS ALIGNED HORIZONTALLY because of their similar appearance.
Date/Time: Wed Jun 26 13:09:26 CDT 2024
ReportID: ID20240626130926
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: 489 [EDC]
Note: This has been fixed in a subsequent draft of the names list for 16.0.
In the code chart for Garay (https://www.unicode.org/charts/PDF/Unicode-16.0/U160-10D40.pdf), the names list has a subhead "Punctuation and reduplication mark" immediately before U+10D6D GARAY CONSONANT NASALIZATION MARK. That character would fit better within the scope of the preceding subhead, "Marks". Proposed change: move the "Punctuation..." subhead after U+10D6D.
Date/Time: Sat Jun 29 15:24:10 CDT 2024
ReportID: ID20240629152410
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 502 [SEW]
The following characters currently have Script=Common and Script_Extensions={Common}: U+16EB RUNIC SINGLE PUNCTUATION U+16EC RUNIC MULTIPLE PUNCTUATION U+16ED RUNIC CROSS PUNCTUATION I could not find any mention anywhere in the Unicode Standard of these characters being used in any script besides Runic, though it is possible they may be. At the very least Runic (Runr) should be added to their Script_Extensions.
Date/Time: Tue Jul 02 15:32:12 CDT 2024
ReportID: ID20240702153212
Name: Karl Pentzlin
Report Type: Public Review Issue
Opt Subject: 502 Unicode 16.0.0 Beta [EDC/Charts]
On a discussion of some symbol characters (L2/23-152) at the ongoing SC2/WG2 meeting in Prague, there were some misunderstandings, as looking at the Unicode code tables only, it was not obvious which characters in fact are Emoji. Thus, it seems advisable to get an easily accessible information in the code chart, whether — a character is "emoji by default", i.e. listed in emoji-sequences.txt as Basic_Emoji, but without FE0F in the first column, — or a character is "selectable as emoji" by the variation selector U+FE0F, i.e. listed in emoji-sequences.txt as Basic_Emoji, together with FE0F in the first column. I had mailed this to Asmus Freytag as the author of the Unibook software. In his answer, he recommended me to outline the problem in a response to the Unicode 16.0 beta review (however, I will not hurry anyone to discuss this issue before Unicode 17). As he wrote, this would focus on the use case of not being able to tell something that so fundamentally affects the identity of a character from looking at the code charts. Particularly, as for emoji, the representative glyph in the code chart lacks the relevance that it has for other characters and may, in fact be misleading. It can be noted, that the code charts already indicate those characters, for which there is a standardized variant, and for which, therefore, the sole representative glyph may not be giving the full information.