The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 24, 2024, since the previous cumulative document was issued prior to UTC #180 (July 2, 2024).
The links below go directly to open PRIs and to feedback documents for them, as of October 24, 2024.
Issue Name Feedback Link 508 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)
The links below go to locations in this document for feedback.
Feedback routed to CJK & Unihan Working Group for evaluation [CJK]
Feedback routed to Script Encoding Working Group for evaluation [SAH]
Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]
Feedback routed to Emoji Standard & Research Working Group for evaluation [ESC]
Feedback routed to Editorial Working Group for evaluation [EDC]
Other Reports
Date/Time: Sun Jul 07 18:01:08 CDT 2024
ReportID: ID20240707180108
Name: Paul Masson
Report Type: Error Report
Opt Subject: Variants for U+6784 构
This character is listed as its own simplified and traditional variant. That is just simply wrong.
Date/Time: Sun Jul 07 21:44:35 CDT 2024
ReportID: ID20240707214435
Name: Paul Masson
Report Type: Error Report
Opt Subject: Variants for U+5978 奸
This character is listed as its own simplified and traditional variant. That is just simply wrong.
Date/Time: Mon Jul 08 19:14:05 CDT 2024
ReportID: ID20240708191405
Name: Paul Masson
Report Type: Error Report
Opt Subject: Variants for U+575B 坛
This character is listed as its own simplified and traditional variant. That is just simply wrong.
Date/Time: Thu Aug 08 23:14:55 CDT 2024
ReportID: ID20240808231455
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: On the addition of an extra kStrange entry
Character U+3106C has an entry on the kStrange Unihan property but only for being "Stroke Heavy" when the fact that it's top and bottom rows do not strecht to fill the character cell, makes it my opinion, also a good candidate for the "Unusual Arrangenment or Structure" category.
Date/Time: Sat Aug 10 13:51:55 CDT 2024
ReportID: ID20240810135155
Name: M
Report Type: FAQ Suggestion
Opt Subject: Unihan Database Properties (kGB7)
I wonder why only 42 characters have the property kGB7
Date/Time: Mon Aug 26 14:59:02 CDT 2024
ReportID: ID20240826145902
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Unihan database error
The kRSUnicode and kTotalStrokes property values for U+23D92 𣶒 are both incorrect. Instead of being 85.8 and 11 respectively, their property values should be 2.8 and 9, respectively.
Date/Time: Wed Sep 11 18:35:02 CDT 2024
ReportID: ID20240911183502
Name: Ryusei Yamaguchi
Report Type: Error Report
Opt Subject: the code charts for Unicode version 16.0.0
In the code charts for Unicode version 16.0.0, the glyphs in the J column for the following characters do not match the corresponding J-source codes. "code point","kIRG_JSource","glyph in the chart" "U+2D0B2","JMJ-059372","MJ068097" "U+2D4F1","JMJ-059505","MJ068098" "U+2EA41","JMJ-060341","MJ068100"
Date/Time: Sat Sep 21 06:42:45 CDT 2024
ReportID: ID20240921064245
Name: Andrew West
Report Type: Error Report
Opt Subject: Unihan_Variants.txt
In Unihan_Variants.txt there are these two entries: U+8AE9 kSimplifiedVariant U+2C8F2 U+2C8F2 kTraditionalVariant U+8AE9 However, U+8AE9 諩 is a variant form of U+8B5C 譜, and does not simplify to U+2C8F2 𬣲. On the other hand, the correct traditional mapping for U+2C8F2 𬣲 is U+8A81 誁. Therefore remove these two entries: U+8AE9 kSimplifiedVariant U+2C8F2 U+2C8F2 kTraditionalVariant U+8AE9 And add these three entries: U+8A81 kSimplifiedVariant U+2C8F2 U+8AE9 kSemanticVariant U+8B5C U+2C8F2 kTraditionalVariant U+8A81
Date/Time: Sat Sep 21 07:07:19 CDT 2024
ReportID: ID20240921070719
Name: Andrew West
Report Type: Error Report
Opt Subject: Unihan_Variants.txt
In Unihan_Variants.txt there are these two entries: U+292CC kSimplifiedVariant U+31071 U+31071 kTraditionalVariant U+292CC However, U+292CC 𩋌 (⿰革易) does not simplify to U+31071 𱁱, which is the simplified form of U+292EC 𩋬 (⿰革昜). Therefore remove these two entries: U+292CC kSimplifiedVariant U+31071 U+31071 kTraditionalVariant U+292CC And add these two entries: U+292EC kSimplifiedVariant U+31071 U+31071 kTraditionalVariant U+292EC
Date/Time: Sun Oct 06 02:56:51 CDT 2024
ReportID: ID20241006025651
Name: Philippe Verdy
Report Type: Error Report
Opt Subject: /Public/UCD/latest/ucd/CJKRadicals.txt
Note: The editors have evaluated and responded to this report. No further UTC action is necessary.
There's a missing entry in CJKRadical.txt (https://www.unicode.org/Public/UCD/latest/ucd/CJKRadicals.txt) for an 'unencoded' CJK radical present in the composition of unified ideographs for modern Chinese, and only represented by the CJK unified ideograph U+9FBA (龺). It is an additional variant of the Kangxi radical 159 U+2F9E (⾞), i.e. a narrowed form of Unified ideograph 𠦝 (U+2099D) used on the left side. ... 158; 2F9D; 8EAB 159; 2F9E; 8ECA 159'; 2ECB; 8F66 + 159''; ; 9FBA 160; 2F9F; 8F9B ... It should be listed, just like the three other unencoded non-Kangxi CJK radicals for variants of Kangxi radicals: ... 181'; 2EDA; 9875 182; 2FB5; 98A8 182'; 2EDB; 98CE * 182''; ; 322C4 183; 2FB6; 98DB ... 207; 2FCE; 9F13 208; 2FCF; 9F20 * 208''; ; 9F21 209; 2FD0; 9F3B ... 211; 2FD2; 9F52 211'; 2EEE; 9F7F 211''; 2EED; 6B6F 212; 2FD3; 9F8D 212'; 2EF0; 9F99 212''; 2EEF; 7ADC * 212'''; ; 31DE5 213; 2FD4; 9F9C 213'; 2EF3; 9F9F 213''; 2EF2; 4E80 ... Additional question: Shouldn't these four non-Kangxi CJK radicals (159'', 182'', 208'', 212''') be encoded ? For example in existing block 2E80-2EFF CJK Radicals Supplement (where U+2E9A and U+2EF3-2EFF ar still unassigned)? And then shouldn't the existing IDS (for composite ideographs using them), be updated in UniHan to preferably use these 4 new radicals (where appropriate), rather than their associated unified ideographs (respectively U+9FBA, U+322C4, U+9F21, U+31DE5). All this should be done within the existing framework for better radical-stroke indexes which will use the newly properties added in the recently released Unicode 16.0.
Date/Time: Mon Jul 15 14:29:00 CDT 2024
ReportID: ID20240715142900
Name: David Corbett
Report Type: Error Report
Opt Subject: L2/24-182
The problem statement in L2/24-182 says a few things about U+20DD in fonts that are not true. > The problem with using U+20DD is that it cannot adjust the advance width of the character that it encloses, with the consequence that without manual spacing or kerning it will overstrike a preceding character. U+20DD, like any character, can adjust the advance widths of other characters using contextual positioning. > The only non-manual solution would be the impractical one of creating a specialty font with a substitution character for every combination of IPA letter and ◌⃝, A font does not need to use ligature substitutions as this sentence claims. Positioning the circle is akin to kerning which is already common in fonts and can be automated. > with internal anchor points for diacritics that would now need to be input after the circle. Diacritics could be input before or after the circle. Contextual positioning of U+20DD can easily skip intervening marks. I am not against the proposal itself, but the proposal should not use these reasons as motivation. The proposal seems to be saying it would hard to implement a font with the proper rendering using U+20DD, so the solution is two new characters with which it would also be hard to implement the proper rendering. One good reason for the new characters is that they can encircle multiple bases, which U+20DD can’t (since U+034F was changed to not support this use case).
Date/Time: Mon Jul 01 11:45:31 CDT 2024
ReportID: ID20240701114531
Name: Guru Prasad
Report Type: Public Review Issue
Opt Subject: 502
Tulu Tigalari adopted for modern Tulu & manuscripts Followup suggestion to L2/22-068 Apr 15, 2022 response to L2/22-075 Issue with changing consonant addition using halanth used in all Indic languages to a new symbol like a variance of wingding suggested in L2/22-031 and response L2/22-068. 1. Kindly consider using Nukta or other unicode assignment for Visible Virama and leaving the current invisible virama as is allowing legacy documents, typing , transliteration to happen with ease to and from Tulu-Tigalari.
Date/Time: Thu Jul 18 15:26:37 CDT 2024
ReportID: ID20240718152637
Report Type: Public Review Issue [SEW]
Name: Philippe Verdy
Opt Subject: 502
Note: This has already been fixed in a subsequent draft.
Minor editorial issue: The following grouping is used in the current beta charts and name lists for the Garay Block (in Unicode 16.0 Draft Public Review): ; Marks 10D6A GARAY CONSONANT GEMINATION MARK 10D6B GARAY COMBINING DOT ABOVE 10D6C GARAY COMBINING DOUBLE DOT ABOVE ; Punctuation and reduplication mark 10D6D GARAY CONSONANT NASALIZATION MARK 10D6E GARAY HYPHEN 10D6F GARAY REDUPLICATION MARK However 10D6D GARAY CONSONANT NASALIZATION MARK should be under "Marks" (like 10D6A GARAY CONSONANT GEMINATION MARK), not under "Punctuation and reduplication mark"
Date/Time: Fri Jul 19 09:48:15 CDT 2024
ReportID: ID20240719094815
Name: Philippe Verdy
Report Type: Public Review Issue [SEW]
Opt Subject: 502
Representative glyph for 18CFF (KHITAN SMALL SCRIPT CHARACTER-18CFF) The current draft chart indicates that this is representing a missing or illegible character (this is then intended for long term usage in encoded texts, for reference, rather then inserting "educated guesses"). However the representative glyph for now just shows a basic square, which looks too much as "tofu" (used when there's no font available, and where an alternate representation using graphics could be used, e.g. on the web), or like regular geometric shape. We have much enough regular rectangular shapes in Unicode. Let's not abuse it for something intended to be unreadable/obscure. Older terminal protocols used black squares or checkerboards, or patterns, or some bordered or hollow question mark. My opinion is that this glyph should better be some irregular (not purely rectangular) shape (e.g. with some missing corners), like a partially burn paper sheet, and with dashed or dotted borders possibly filled with irregular checkerboard or pseudo-random dots or strokes (not near the damaged corners/borders where they could be bolder or could simulate a shadowing effect). A question mark (possible rotated or mirrored) may also be added on top of that shape. Another good glyph could be a backward slanted mirrored question mark, hollowed, or inverted inside a "warning triangle" or some irregular dotted rectangle (possibly not fully closed, with a missing corner at the bottom right). It should however adopt the ideographic metrics of other Khitan letters. We should be more imaginative, while avoiding visual confusion with other regular characters (from any script or set of symbols).
Date/Time: Fri Aug 09 15:44:45 CDT 2024
ReportID: ID20240809154445
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: On the newly approved Greek characters
Recently new Greek letters and modifier letters were approved for phonetic notation to be included in the Latin Extended G block (next to related IPA characters). I advise that these characters are reassigned to the Greek and Coptic block, as well as podsibly the Greek Extended block in the following way: the three letters with palatal hook can be placed in the 0380-0382 and the two modifier letters can go in 1F7E-1F7F or alternatively in 0378-0379. While placing Greek letters along with Latin letters has been done before, that block was under the generic name of Phonetic Extensions, named that way precisely because letters of different scripts could occupy it. While the risk of confusion is minor, I don't believe it's worth breaking with precedent when a more elegant solution is available. The modifier letters in particular, are bound to have a larger demand due to them being superscript versions of letters in the basic Greek alphabet. So it would be quite odd to find them in a Latin specific block that is not even in the BMP.
Date/Time: Fri Aug 09 15:54:24 CDT 2024
ReportID: ID20240809155424
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: On the newly approved Hiragana ligature
Three Kana Ligatures have been approved and assigned into one of the Kana Extension blocks. I advise that the Hiragana Digraph Koto be reassigned to 3040 in the main Hiragana block. While I would suggest the same for the Katakana ligatures, unfortunately the Katakana block is fully occupied.
Date/Time: Wed Aug 14 06:16:38 CDT 2024
ReportID: ID20240814061638
Name: Charlotte Buff
Report Type: Error Report
Opt Subject: L2/24-080
U+1AE9 was accepted for a future version under the name COMBINING LEFT ANGLE CENTERED ABOVE (cf. 179-C58). For consistency with existing character names (which use British spelling), the name should be spelled COMBINING LEFT ANGLE *CENTRED* ABOVE instead.
Date/Time: Wed Jul 31 03:01:40 CDT 2024
ReportID: ID20240731030140
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
https://www.unicode.org/reports/tr14/#CJ Version: Unicode 15.1.0 Date: 2023-08-15 Revision: 51 Location: 5.1 Description of Line Breaking Properties CJ: Conditional Japanese Starter Problematic text: CSS Text Level 3 (which supports Japanese line layout) defines three distinct values for its line-break behavior: • strict, typically used for long lines • normal (CSS default), the behavior typically used for books and documents • loose, typically used for short lines such as in newspapers Possible correction: Delete "(CSS default)". Explanation: In CSS, at least in the current CSS Text Level 3 Candidate Recommendation, and the latest CSS Text Level 4 Working Draft, the default line-break behavior is not "normal". It is "auto", which basically means the browser can do whatever it wants by default. Indeed, my Firefox by default does not break before small hiragana. It does when "line-break: normal" is explicitly specified. https://www.w3.org/TR/css-text-3/#line-break-property https://www.w3.org/TR/2024/WD-css-text-4-20240529/#line-break-property
Date/Time: Wed Jul 31 08:12:26 CDT 2024
ReportID: ID20240731081226
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
https://www.unicode.org/reports/tr14/#LB9 Version: Unicode 15.1.0 Date: 2023-08-15 Revision: 51 Location: 6.1 Non-tailorable Line Breaking Rules [LB9] "Treat X (CM | ZWJ)* as if it were X (where X is any line break class except BK, CR, LF, NL, SP, or ZW)." [LB12] "GL ×" Problem: U+034F COMBINING GRAPHEME JOINER is in Mn, but its line breaking class is GL, not CM. This causes unexpected behavior when GCJ is used in the middle of a combining character sequence. Take the following two sequences: (1) <u, COMBINING DIAERESIS, EM DASH> (2) <u, CGJ, COMBINING DIAERESIS, EM DASH> In (1), a line break is allowed before EM DASH (which has line breaking class B2). In (2), LB9 applies with CGJ taking the place of X, then LB12 kicks in to forbid a line break before the EM DASH. How I came up with the example: Section 23.2 "Layout Controls" of the Unicode Standard explicitly mentions the use of CGJ in German text to make a distinction between u-umlaut (which is sorted like <u,e>) and u-diaeresis (which is sorted like “u” with a secondary weight). The distinction is purely for collation and it doesn't make sense for such CGJ to affect line breaking behavior after the umlaut/diaeresis. This is impossible to solve without separating CGJ in a different line breaking class from NBSP (currently both are GL). To see this, observe that in sequence (2) above, if NBSP were used in place of CGJ, the suppression of the line break before EM DASH is exactly the expected behavior. This is also impossible to solve by tailoring, as CM and GL are non-tailorable classes, and LB9 and LB12 are non-tailorable rules. While at it, I will also point out a typo: [LB10] "Treat any remaining CM or ZWJ as it if were AL." In this definition, the order of "it" and "if" should be reversed.
Date/Time: Thu Aug 01 09:18:31 CDT 2024
ReportID: ID20240801091831
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
https://www.unicode.org/reports/tr14/#LB15b Version: Unicode 15.1.0 Date: 2023-08-15 Revision: 51 Location: LB15a, LB15b I found the following document which describes these new rules: https://www.unicode.org/L2/L2023/23063-break-quot-mark.pdf Reading through it, it seems that the inclusion of WJ and SY in LB15b (but not in LB15a) might have been accidental, and not really intended by the author. Perhaps it is an artifact of importing the rules from another representation. Regarding WJ, it seems strange that SP×Pf×WJ, i.e. that WJ should act-at-a-distance across the quotation mark. If somebody actually used WJ after Pf, they probably intended to prevent a line break to the right of Pf, not to the left. Yes, such WJ is redundant in the current version of the algorithm, but implementations deviate (especially Far Eastern implementations tend to allow line breaks much more often), so the WJ might be there in the text for a valid real-world reason. Given that SP×Pf×WJ doesn't seem to have any merit for French (somebody able to type WJ in French could just type <SP,WJ,Pf>, after all), I believe WJ should not be included in LB15b. Including it in LB15b penalizes a user who is mindful about their line breaks (explicitly using WJ), for the sake of somebody who is not careful enough to put the WJ at the correct place. Regarding SY, the slash »/« is often used in Unix paths, such as »/usr/bin«. I am not familiar with the particulars of French usage, but does it occur « comme ça »/ frequently enough (without a space before the slash) to merit inclusion in LB15b? If it does, then it probably also occurs with the same frequency /« comme ça », so it doesn't make sense to include it in LB15b but not in LB15a. If WJ and SY are included in LB15b purely for a technical reason (to ease implementations using a particular kind of software), and that reason is important enough to merit complicating the user-facing semantics of WJ, then this should probably be stated in the text.
Date/Time: Mon Aug 05 05:53:22 CDT 2024
ReportID: ID20240805055322
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
https://www.unicode.org/reports/tr14/#Examples Version: Unicode 15.1.0 Date: 2023-08-15 Revision: 51 Location: 8.2 Examples of Customization, Example 7 Problematic text: The tailoring can be accomplished by first segmenting the text into grapheme clusters according to the rules defined in UAX #29, and then finding line breaks according to the default line break rules, as follows: After applying the mandatory line break rules, give each grapheme cluster the line breaking class of its first code point. Explanation: This text was changed recently to avoid recommending a non-conforming tailoring: https://www.unicode.org/L2/L2022/22244-utc173-properties-recs.pdf I agree that with this change the UAX no longer formally contradicts itself, but it still doesn't mean the approach gives sensible results. Here is an example of misbehavior if the wording of the problematic text is taken at face value: <U+1112,U+1161,U+11AB, U+1100,U+1173,U+11AF> (literally: 한글) These are two Korean syllables, each composed of three code points: a leading consonant, a vowel, and a trailing consonant. Segmenting into grapheme clusters will produce two clusters, one for each syllable. If, as the text suggests, we give each cluster the line breaking class of its first code point, this would give each cluster the incorrect line breaking class JL (the class for leading consonants) instead of the correct H3 (the class for three-component syllables). Since the line breaking algorithm does not allow line breaks between leading consonants, there will be no line breaks in the entire sequence. Now these are just two Korean syllables, so the missed line breaking opportunity between them may not matter, but the same logic holds for an arbitrary long sequence of Korean syllables, potentially forbidding any line breaks in a long run of Korean text. Another possible example of misbehavior is a sequence of several Emoji flags, e.g. <RI,RI, RI,RI>. Segmenting into grapheme clusters will group together pairs of Regional Indicators, then giving each pair the line breaking class RI will result in prohibition of line breaks between pairs-of-pairs. This is probably not what was intended. I have not worked out the details for cases of Grapheme_Cluster_Break=Prepend, but they should probably be verified, and then again for each new update of UAX #29, because the segmentation logic tends to get more and more complicated over the years. In summary, I think it is better not to mislead the reader that it is a simple matter to tailor the line breaking algorithm to work sensibly on grapheme cluster boundaries. Either a complete working solution should be offered, or the reader should be warned of the existence of potential problems.
Date/Time: Mon Aug 05 06:23:35 CDT 2024
ReportID: ID20240805062335
Name: Rossen Mikhov
Report Type: Error Report
Opt Subject: UAX #14: Unicode Line Breaking Algorithm
https://www.unicode.org/reports/tr14/#Examples Version: Unicode 15.1.0 Date: 2023-08-15 Revision: 51 Location: 8.2 Examples of Customization, Example 7 I would like to add to the feedback that I submitted on this topic a few minutes ago. Maybe a workable approach would be: 1. Run both the segmentation algorithm and the line breaking algorithm in parallel, unmodified. 2. Delete the line breaking opportunities that happen to fall within grapheme clusters. If 2. deletes a non-tailorable line breaking opportunity (produced by rules LB2-LB12), then this means the problem is impossible to solve in the first place. It would be nice to also verify that it is impossible for 2. to delete too many line breaking opportunities, producing long runs of legitimate text without line breaks.
Date/Time: Thu Aug 08 21:51:58 CDT 2024
ReportID: ID20240808215158
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: TUS
Hello, The Unicode Standard misadvises about composing custom vulgar fractions, as it recommends breaking spaces to separate integers and vulgar fractions. It even recommends U+200B: “If the fraction is to be separated from a previous number, then a space can be used, choosing the appropriate width (normal, thin, zero width, and so on). For example, 1 + thin space + 3 + fraction slash + 4 is displayed as 1¾.” https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf#page=302&zoom=100,0,400 Although it was intended to be no-break, the Unicode THIN SPACE U+2009 is breaking. So is the ZERO-WIDTH SPACE U+200B, but by design. The text of TUS is the more inadequate as there is no space between the integer and the precomposed fraction. I’d suggest changing this to: A preceding integer part must be separated from the digits composing the fraction. This can be achieved using any of U+200C ZERO WIDTH NON-JOINER, U+2060 WORD JOINER, U+202F NARROW NO-BREAK SPACE, or another no-break character of the appropriate width. I noted this already on 2023-08-31T0736+0200 and came across it again now while documenting source code and keyboard layouts. Best regards, Marcel Schneider
Date/Time: Fri Aug 09 21:37:02 CDT 2024
ReportID: ID20240809213702
Name: Robert Thomson
Report Type: Error Report
Opt Subject: Unicode Standard Annex #42
With respect to UAX #42 for unicode version 15.1.0 at https://www.unicode.org/reports/tr42/#d1e3008 viewed 2024-08-10, I believe there are a couple of minor errors: In section 4.4.2 Name properties, the character name has a pattern option of <control>. None of the codepoints have that pattern, and I believe that with revision 9 and the introduction of the name alias pattern there is no longer the requirement to include "|(<control>)" in the character name pattern. [name pattern, 12] = character-name = xsd:string { pattern="([A-Z0-9 #\-\(\)]*)|(<control>)" } If you should agree with the previous conclusion then Section 12 contains an example fragment that is also in error <char cp="001F" age="1.1" na="<control>" na1="UNIT SEPARATOR" gc="Cc" bc="S" lb="CM"/>
Date/Time: Thu Sep 19 09:19:51 CDT 2024
ReportID: ID20240919091951
Name: Malo
Report Type: Error Report
Opt Subject: MathClass
As of Unicode 15, in MathClass documents (https://www.unicode.org/Public/math/revision-15/*), the character U+22A5 ⊥ UP TACK is classified as a Relation (R). This is contradictory with its use as a value (class N for Normal) in many fields such as logic and type theory (where it is often referred to as "bot," or "bottom"). In fact, U+22A4 ⊤ UP TACK ("top"), which is used along with top in those fields, is classified as Normal (N). This is likely due to a confusion with the homoglyphic perpendicular symbol (U+27C2 ⟂ PERPENDICULAR), which is correctly classified as a Relation (R). It is this exact difference between bot being used as a value and the perpendicular sign being used as a relation that lead to the introduction of those two distinct characters in Unicode, according to this 2003 draft: https://www.unicode.org/L2/L2003/03194-math-letterlike.pdf. As a final note, bot was initially properly classified as Normal (N) in Unicode 9 (https://www.unicode.org/Public/math/revision-09/MathClass-9.txt), but this changed with Unicode 11. If this change was intentional, I think this oddity deserves a comment in the MathClass files to inform the reader that this is not a mistake, and a short explanation.
Date/Time: Mon Oct 21 14:42:36 CDT 2024
ReportID: ID20241021144236
Name: Huáng Jùnliàng
Report Type: Error Report
Opt Subject: UTS #18
In section 1.2.5, there is a table containing General Category Property values and three star entries, Any, Assigned and ASCII. Although there is a note that starred entries in the table are not part of the enumeration of General_Category values, it may still be a little bit confusing as one browser engine maintainer interprets[1] that ASCII belongs to General Category: > Yes, but that means that they are not part of the enumeration of values and not that they don't belong to that category. I.e. they are not listed as being part of that categories in UnicodeData.txt. Can we we improve the text and/or the table layout to clarify that Any, Assigned and ASCII are not a General_Category property value? [1]: https://issues.chromium.org/u/0/issues/373759990#comment5
Date/Time: Thu Jul 18 18:33:41 CDT 2024
ReportID: ID20240718183341
Name: Peter G Constable
Report Type: Public Review Issue
Opt Subject: 496
Note: This report is about a proposed update and the error has been fixed in the released version.
I recognize this is a late report, but I just noticed this typo in PU UTS #51, in section 2.6. In revision 26 (2024-6-26), the first sentence of section 2.6 has the following (revised) wording: "There are several emoji that depict more than one person interacting. When implemented with a choice or genders or skin tones, special handling is required on a case-by-case basis." The phrase "choice or genders or skin tones" appears to have a typo: I assume what is intended is "choice of genders or skin tones".
Date/Time: Fri Jul 26 02:41:07 CDT 2024
ReportID: ID20240726024107
Name: Werner Lemberg
Report Type: Error Report [EDC]
Opt Subject: NamesList.txt
As discussed in the thread starting at https://corp.unicode.org/pipermail/unicode/2024-July/010976.html it turned out that the two characters 1D132 MUSICAL SYMBOL QUARTER TONE SHARP 1D133 MUSICAL SYMBOL QUARTER TONE FLAT are not accidentals but *pitch modifiers*, to be added to left of an accidental (or a note without an accidental) and indicating that the pitch of the given note has to be raised or lowered by a quarter tone, respectively. The provided scans in the discussion confirm this usage. In other words, these two characters should be put into a separate section `@ Pitch modifiers` or something like that.
Date/Time: Thu Aug 08 09:06:48 CDT 2024
ReportID: ID20240808090648
Name: Lucas
Report Type: Error Report
Opt Subject: Multiple
The Latin Letters D, K, L, N and R as used in Livonian, Old-Prussian, Latvian and Romanian (all around the Baltic area) are supposed to have a comma underneath, and not a cedilla. I have not found a single source that needs these letters with an actual cedilla, other than errors caused by you, Unicode. According to Wikipedia these letters were mistakenly encoded with a Cedilla by Unicode in the early nineties, and that Unicode claims these errors can not be fixed, (even though, in general, the computer world is all about bugfixing). These letters should not combine with 0327, but with 0326, as you probably know, since the font used in your charts shows a proper comma-accent. The Calibri font fonts I designed also use comma accents. Your Unicode-bugs are the cause of many fonts actually using cedillas instead of comma accents. Your bug has also caused the recent DIN 91379 Norm to include sequences for these letters combined with 0326 comma accent, instead of using the existing Unicodes of the precomposed letters. If you, for whatever reason, refuse to fix the bugs introduced by your predecessors, than at least add notes to ALL of these 10 codepoints, in your charts, that this was a historic mistake, and that the accents should actually look like free floating comma accents (0326) and not cedillas (0327). 1E10 Ḑ LATIN CAPITAL LETTER D WITH CEDILLA (0044 + 0327) 1E11 ḑ LATIN SMALL LETTER D WITH CEDILLA (0064 + 0327) 0136 Ķ LATIN CAPITAL LETTER K WITH CEDILLA (004B + 0327) 0137 ķ LATIN SMALL LETTER K WITH CEDILLA (006B + 0327) 013B Ļ LATIN CAPITAL LETTER L WITH CEDILLA (004C + 0327) 013C ļ LATIN SMALL LETTER L WITH CEDILLA (006C + 0327) 0145 Ņ LATIN CAPITAL LETTER N WITH CEDILLA (004E + 0327) 0146 ņ LATIN SMALL LETTER N WITH CEDILLA (006E + 0327) 0156 Ŗ LATIN CAPITAL LETTER R WITH CEDILLA (0052 + 0327) 0157 ŗ LATIN SMALL LETTER R WITH CEDILLA (0072 + 0327) ASAP please, thank you.
Date/Time: Sat Aug 31 22:24:11 CDT 2024
ReportID: ID20240831222411
Name: Guillaume Fortin-Debigaré
Report Type: Error Report
Opt Subject: Unicode 15.1.0 Core Specifications - Chapter 22 Symbols
Note: This error has been fixed in the Unicode 16.0 core spec.
Table 22-5 "Mathematical Operators Disunified from Punctuation" lists the incorrect Unicode code point for the SOLIDUS character in the second row of the left column. If should be 002F instead of 003F.
Date/Time: Sat Sep 07 05:14:42 CDT 2024
ReportID: ID20240907051442
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: U0000.pdf
A minor slip: In U0000.pdf, the following is shown with two right single quotation marks (they are not ASCII apostrophes!) instead of a left and a right one: for ’Greek question mark’
Date/Time: Wed Sep 11 04:07:14 CDT 2024
ReportID: ID20240911040714
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: U2100.pdf
There are two issues with the informative aliases “first transfinite cardinal (countable)”, “second transfinite cardinal(the continuum)”, “third transfinite cardinal (functions of a real variable)” and “fourth transfinite cardinal” for the characters U+2135 (ALEF SYMBOL), U+2136 (BET SYMBOL), U+2137 (GIMEL SYMBOL) and U+2138 (DALET SYMBOL), respectively. 1) Aleph is used together (!) with 0, 1, … as an index to indicate cardinalities of well-ordered infinite sets (in ascending order). (Without an index, it is apparently sometimes used for the cardinality of the continuum, not the first transfinite cardinal!) Beth and gimel are also used with an index (you can look up the definition), while daleth does not have an established meaning and was apparently just included in LaTeX so that it can be used in an ad-hoc manner. (Even if there is someone out there who uses the characters as the aliases indicate, that would be an idiosyncrasy that does not deserve mention in the only alias.) 2) That the cardinality of the continuum is the second transfinite cardinal amounts to the continuum hypothesis, which is known to be independent of the set theory ZFC, and among those set theorists who have a belief either way, it seems like most believe it to be false.
Date/Time: Wed Sep 11 05:14:22 CDT 2024
ReportID: ID20240911051422
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject:
Two further remarks: 1) The reference glyph for U+3388 and that for U+3389 have an italicized “cal” for the calorie. This unit symbol should not be italicized. While the glyphs are not normative, it would be great if this could be corrected; an italic mu (in glyphs of the chart) has already been corrected to an upright one. 2) The character U+2263 (≣ STRICTLY EQUIVALENT TO) is found under the subhead “Relations”. I think it would be more appropriate to put it under “Logical operator” (for comparison: U+2227) because it stands for a connective in modal logic. See here: https://corp.unicode.org/pipermail/unicode/2022-July/010231.html
Date/Time: Tue Sep 24 06:11:31 CDT 2024
ReportID: ID20240924061131
Name: Ben Harris
Report Type: Error Report
Opt Subject: The Unicode® Standard Version 16.0 – Core Specification
A piece of text has been lost in the translation to HTML for Unicode 16. In Unicode 15.1.0, this text appears: "So for example, the representation of the number 12,346 in the traditional system would be by a sequence of CJK ideographs with numeric values as follows: <one, ten-thousand, two, thousand, three, hundred, four, ten, six>." That is, the example is "one, ten-thousand, two, thousand, three, hundred, four, ten, six", surrounded by less-than and greater-than signs. In Unicode 16.0.0, at https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-22/#G46185, the same sentence reads: "So for example, the representation of the number 12,346 in the traditional system would be by a sequence of CJK ideographs with numeric values as follows: ." That is, the entire text within and including the less-than and greater-than signs has vanished. The HTML source shows that the text does actually appear in the source, but the less-than sign has not been properly escaped and so is interpreted as markup by browsers. This makes me suspect that there may be other similar problems elsewhere in the standard. I haven't (yet) made any attempt at looking for them.
Date/Time: Fri Oct 04 11:39:10 CDT 2024
ReportID: ID20241004113910
Name: Malo
Report Type: Error Report
Opt Subject: The Unicode® Standard Version 16.0 Core Specification
Section 24.1.9 of the Unicode® Standard Version 16.0 Core Specification (https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-24/#G3725) includes sample character list which contains a mistake: 212B Å ANGSTROM SIGN is incorrectly marked as having the canonical mapping 00C5 Å angstrom sign, instead of 00C5 Å latin capital letter a with ring above. Note that this error is not present in the corresponding chart (https://www.unicode.org/charts/PDF/U2100.pdf).
Date/Time: Sun Oct 06 16:19:03 CDT 2024
ReportID: ID20241006161903
Name: Jim DeLaHunt
Report Type: Error Report
Opt Subject: www.unicode.org/versions/latest/
Passing on a social media comment about page at https://www.unicode.org/versions/latest/ . Reader visits the page wanting to find the Core Spec (can generalise other parts of the Unicode Standard such as UTRs). Reader expects that the page will contain links to the parts of the core spec which they seek. Instead, the page describes the differences between the latest version of TUS and the previous version. I suggest adding a section to the top of this page, describing "The current version of The Unicode Standard is 16.0.0. It consists of a Core Specification (link), some Code Charts (link), etc. Then put the current content under a heading like "Differences from previous version of the Standard". The present set of links, especially the unnumbered list of links under "B. Technical Overview", might make the reader hope they link to the parts of the Standard, but in fact they link to subheadings below which describe changes. It would be better for the list of links at the top of the page be to the parts of the latest version of The Unicode Standard, as implied by the URL. Original social media post: https://cosocial.ca/@timbray/113170595870924709 , by Tim Bray of XML fame. Relayed by Jim DeLaHunt. The explanation above is mine, not Tim's. He may submit his own Error Report in his own words.
Date/Time: Thu Oct 24 10:04:37 CDT 2024
ReportID: ID20241024100437
Name: Sridatta A
Report Type: Error Report
Opt Subject: Corrections to Unicode chapter of Tulu-Tigalari
In chapter 15 https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-15/#G71814 “Tulu-Tigalari is a historic script attested in a large number of manuscripts from Karnataka and northern Kerala dating to as early as 1300 CE. It was used to write Sanskrit, Tulu, and Malayalam, “ Should be corrected to have Kannada instead of Malayalam. In #Figure 15-5. The glyph is that of ju than chu