The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of (January 8, 2021 - April 22, 2021), since the previous cumulative document was issued prior to UTC #167 (April 27-29, 2021).
The links below go directly to open PRIs and to feedback documents for them, as of April 22, 2021.
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to Properties & Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
Date/Time: Mon Feb 8 09:16:42 CST 2021
Name: Jaemin Chung
Report Type: Other Question, Problem, or Feedback
Opt Subject: kMandarin values for some traditional characters
I suggest that these kMandarin values be added. U+255FD kMandarin lán # 𥗽; from U+2C497 𬒗 U+289C0 kMandarin dù # 𨧀; from U+2CB4A 𬭊 U+28A0F kMandarin bō # 𨨏; from U+2CB5B 𬭛 U+28B4E kMandarin xǐ # 𨭎; from U+2CB73 𬭳 Adding these would completely cover the traditional equivalents of the characters in kTGH (通用规范汉字表).
Date/Time: Tue Feb 16 01:26:03 CST 2021
Name: William He
Report Type: Error Report
Opt Subject: Minor kDefinition Error
The kDefinition for 穸 (U+7A78) appears incorrect. It says, "the gloom of the grave a tomb or grave; death" which may be missing a semicolon after the first instance of "grave". That said, "the gloom of the grave" is unclear regardless.
Date/Time: Mon Mar 22 14:18:10 CDT 2021
Name: Ryusei Yamaguchi
Report Type: Public Review Issue
Opt Subject: PRI #421 UNIHAN proposed update feedback
In the description of kPhonetic property, kPhonetic value of U+8753 is mistyped: > An asterisk is appended when a character has the given phonetic class but is not explicitly included in the character list for that class. For example, 蝓 (U+8753) belongs to the class 1161 but is not explicitly listed in that class. Its kPhonetic value is therefore "1161*". Correct kPhonetic value of U+8753 is "1611*".
Date/Time: Wed Apr 14 06:17:42 CDT 2021
Name: Štěpán Zídek
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: KP0-E5A9 mapping
Mr. Jaemin Jung proposed to change the mapping of KP0-E5A9 to U+67FF (柿) instead of current U+676E (杮) in document L2/21-059. KP0-E5A9 should be mapped to U+67BE (枾, read as 시 too) rather than to U+67FF (柿). This mapping would be more accurate, since the character '枾', coded as E5A9, is used in SamHung 3.0 multilingual dictionary, which originates from North Korea and uses KPS9566 coding. I can provide font bitmaps from SamHung 3.0 to support my claim.
Date/Time: Tue Jan 19 19:39:21 CST 2021
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: On the Kawi space filler and the names of punctuation characters
This is a response to https://www.unicode.org/L2/L2020/20284r-kawi.pdf I would like to point out that the character PUNCTUATION SPACE FILLER has an identical character to the DIGIT FOUR. Considering that the letter RO was unified with the DIGIT TWO for the same reason, I recommend to remove the SPACE FILLER and annotate the DIGIT FOUR with its function. This argument isn't valid if a consistent glyph difference is attested between them. Furthermore I also recommend some other names for other punctuation characters: KAWI PUNCTUATION ALTERNATE SECTION MARK -> KAWI PUNCTUATION SECTION MARK WITH REPHA KAWI PUNCTUATION FILLED CIRCLE -> KAWI PUNCTUATION CIRCLE WITH DOT KAWI PUNCTUATION CLOSING SPIRAL -> KAWI PUNCTUATION SPIRAL WITH WAVY TAIL I also recommend annotating the SPIRAL character with the alias "siddham"
Date/Time: Tue Feb 23 20:12:22 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Kannada <ra, ZWNJ, virama, consonant>
Section 5.21 says that “a format character may have no visible effect on display at all”, with the example of <x, ZWJ, x>. There is a case where it is not clear whether a format character is supposed to have a visible effect. In Kannada, how should <ra, ZWNJ, virama, consonant> be rendered? Chapter 12 “Kannada” does not define what ZWNJ does in that context. One interpretation is that, since that use of ZWNJ is not defined, it is ignored, i.e. the sequence is rendered the same as <ra, virama, consonant>. Another interpretation is that the sequence should be rendered the same as <ra, ZWJ, virama, consonant>. In Indic scripts where <ZWNJ, virama> is defined, it generally has the effect of blocking special behaviors, such as this special initial form of ra, and inducing subjoined C2 forms. So which is it? See https://github.com/harfbuzz/harfbuzz/issues/2018 for more information.
Date/Time: Tue Feb 23 20:40:37 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Edge case for ZWJ and ZWNJ in Malayalam
The general rule for rendering ZWNJ and ZWJ when they appear unexpectedly is to ignore them. That is, the string should be rendered exactly as if the unexpected join controls weren’t there. What happens if <ZWJ, ZWNJ> or <ZWNJ, ZWJ> appears in a position where either ZWNJ or ZWJ would be expected, but not both? Specifically, in Malayalam, how are <consonant, ZWJ, ZWNJ, virama, consonant> and <consonant, ZWNJ, ZWJ, virama, consonant> rendered? (See table 12-38.)
Date/Time: Thu Feb 25 00:46:04 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect Indic Syllabic Category for Myanmar Sign Asat
U+103A MYANMAR SIGN ASAT currently has Indic_Syllabic_Category=Pure_Killer. This seems incorrect. As the Unicode Standard, section 16.3, describes, this character is used as part of the three-character sequence used to encode the kinzi, a repha-like conjunct form. It seems Indic_Syllabic_Category=Virama would be more appropriate. The situation is similar to U+0BCD TAMIL SIGN VIRAMA, which also doesn’t participate in conjunct formation, except when it does.
Date/Time: Fri Mar 5 20:39:31 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Representing hamza in lam–alef ligature
Chapter 9, section “Arabic”, subsection “Quranic Texts” says that “words spelled with the medial form of U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE in modern Arabic orthographies may appear in Quranic texts without the tooth typical of the letter. There is usually an elongation under the hamza, and the hamza may carry other diacritical marks, such as a fatha. This convention can be thought of as a modified version of yeh-hamza, and is represented with the sequence <U+0640 ARABIC TATWEEL, U+0654 ARABIC HAMZA ABOVE>.” There is another case of a carrier-less hamza: between a lam and an alef in a lam–alef ligature. How should such a hamza be encoded? In https://github.com/googlefonts/noto-fonts/issues/2017, Roozbeh says the recommended sequence is <lam, tatweel, hamza above, alef>. If this is Unicode’s recommendation, it should be made explicit in the standard. The current wording, describing tatweel graphically as like a toothless, dotless yeh, does not apply to any graphical component of a lam–alef ligature, so the subsection might be interpreted as saying nothing about hamzas in lam–alef ligatures.
Date/Time: Mon Feb 1 10:58:28 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Contradictory requirements for U+2044 and default ignorable code points
Chapter 6 says that the fraction slash creates fractions only in the environment `\p{Nd}+\u2044\p{Nd}+`. However, chapter 5 says that default ignorable code points should sometimes be ignored for display, with the example that “U+200B ZERO WIDTH SPACE affects word segmentation, but has no visible display”, and chapter 23 says that outside of a defined variation sequence, “use of a variation selector character does not change the visual appearance of the preceding base character from what it would have had in the absence of the variation selector.” How should these contradictory requirements be resolved? For example, should <digit, variation selector, slash, digit> and <digit, ZWSP, slash, digit> be displayed as fractions or not?
Date/Time: Sat Apr 24 13:03:04 CDT 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Response to L2/21-069
> David does not include a use case for combinations of fractions with default > ignorable code points in his submission. The use case is the slashed zero in fractions. L2/21-069’s recommendation in F1 implies that <zero, VS1, fraction slash, one> should be rendered as <full-sized slashed zero, slash, full-sized one>, but that <one, fraction slash, zero, VS1> may be rendered <numerator one, fraction slash, denominator slashed zero>.
Date/Time: Wed Jan 13 10:32:04 CST 2021
Name: William Overington
Report Type: Other Question, Problem, or Feedback
Opt Subject: Abstract emoji
Could Unicode, Inc. please consider allowing abstract emoji to become in scope for being encoded in regular Unicode? Abstract emoji could be very helpful for communicating through the language barrier. I have recently published a colour font for sixteen abstract emoji for personal pronouns and it would be helpful if abstract emoji were to become in scope for The Unicode Standard. http://www.users.globalnet.co.uk/~ngo/mariposa_novel.htm William Overington Tuesday 12 January 2021
Date/Time: Sun Feb 14 11:23:49 CST 2021
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/20-064: Glyph for U+1F979 FACE HOLDING BACK TEARS
The glyphic appearance of proposed character U+1F979 FACE HOLDING BACK TEARS seems underspecified. The original proposal (L2/20-064) uses a glyph with a smiling mouth as its main artwork, but throughout the document a different glyph that looks much more distraught and emotionally unstable is used to illustrate usage examples. In particular, the screencaps of cartoon characters in section C (“Image distinctiveness”) all depict faces that are distinctly not smiling. The emoji candidates page (https://www.unicode.org/emoji/future/emoji-candidates.html) uses the distraught glyph, while the draft code chart for the Supplemental Symbols and Pictographs block shows the smiling variant. While both variants can be said to be “holding back tears”, the UTC should investigate whether such a wide range of possible glyphic interpretations could lead to communication issues between end users. The keywords associated with the emoji such as “angry” and “sad” would certainly suggest that the smiling variant is somewhat inappropriate, while the distraught variant is (at least in my opinion) not inherently unsuited for representing emotions such as being proud of another person.
Date/Time: Fri Jan 15 06:59:44 CST 2021
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/19-053: Duplicate Character Name (Znamenny)
This report was reviewed already and the name duplication has been fixed. Please see document L2/21-013, Section F2.
While working on a document concerning Znamenny notation, I discovered an unrelated flaw in the original proposal (L2/19-053): The proposed characters U+1CF2D and U+1CF40 were both given the exact same name – ZNAMENNY COMBINING MARK KRYZH.
Date/Time: Sun Feb 7 00:06:15 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: NamesList documentation refers to “LIGHT SCREEN”
This has already been fixed in draft data files for version Unicode 14.0.
The documentation for the Unicode names list file format at http://ftp.unicode.org/Public/UNIDATA/NamesList.html (revision 13.0.0) refers to a “glyph for LIGHT SCREEN”, to be used instead of an unavailable variant glyph. There’s no explanation what “LIGHT SCREEN” refers to. From the context it appears that it might refer to a Unicode character, but Unicode 13 doesn’t include such a character.
Date/Time: Tue Feb 23 12:03:14 CST 2021
Name: Jungshik Shin
Report Type: Error Report
Opt Subject: Hangul collation and Hangul tone marks
Note: Changes have been made in the draft text for version 14.0 in response to [the first part of] this report.
Hello, I'm writing to give my feedback on TUC 13 section 18.6 Hangul. On pages 746-747, I found the following regarding the collation of Hangul syllables: "Because the order of the syllables in the Hangul Syllables block reflects the preferred ordering, sequences of Hangul syllables for modern Korean may be collated with a simple binary comparison" Although the above is certainly the case of South Korean collation order since 1988 [1], it does not hold true for North Korean sorting rules. Therefore, the locale data for ko-KP needs to be tailored for the Hangul collation. In addition, the section 18.6 does not mention two Hangul tone marks, U+302E and U+302F. To faithfully represent the old Korean text, Hangul tone marks are required and should be mentioned along with Hangul Conjoining Jamos. It'd be great if the two points above could be reflected in TUS 14 or later. Thank you for your consideration, Jungshik Shin [1] Before 1988, there were a couple of 'competing' collation orders even in South Korea and different dictionaries used different sorting rules. It was only in 1988 that the South Korean orthographic standard explicitly specified how to sort Hangul.
Date/Time: Tue Feb 23 19:56:30 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: U+034F COMBINING GRAPHEME JOINER is not always ignored for display
Section 5.21 says “U+034F COMBINING GRAPHEME JOINER is likewise always ignored for display.” This is not true: it has no visible glyph of its own, but it may have a visible effect on other glyphs. For example, see Figure 7-11 and UTR #53. As section 5.21 says earlier on the same page, “In such cases, even though the format character or variation selector has no visible glyph of its own, it would be inappropriate to say that it is ignored for display, because the intent of its use is to change the display in some visible way.”
Date/Time: Wed Feb 24 23:36:32 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Typo: Meetei Mayak Extensions
This has already been updated in the draft for the next version.
Section 13.7 of the Unicode Standard, and the corresponding entry in the table of contents, repeatedly refer to a “Meetei Mayak Extensions” block. The correct name of the block is “Meetei Mayek Extensions”.
Date/Time: Fri Feb 26 03:19:19 CST 2021
Name: huang xin
Report Type: Error Report
Opt Subject: What is the exact definition of assigned character?
The term assigned character seems to have conflict means in the Unicode Standard Version 13.0. Quoted from chapter 2.1: "In contrast, a character encoding standard provides a single set of fundamental units of encoding, to which it uniquely assigns numerical code points. These units, called assigned characters, are the smallest interpretable units of stored text." This suggests that the "units" are called "assigned characters", and "numerical code points" are assigned to "assigned characters". Quoted from chapter 3.5 D49: "Private-use code points are considered to be assigned characters" This suggests that assigned character is a kind of code point. So there is conflict between the two quotes, if assigned character is some kind of code point, how can "numerical code point" be assigned to some kind of code point?
Date/Time: Sat Feb 27 21:03:22 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Chapter 17 intro miscounts Indonesian scripts
The introduction to chapter 17 in TUS 13.0 says "Indonesia has many local, traditional scripts, most of which are ultimately derived from Brahmi. Six of these scripts are documented in this chapter." The actual number of Indonesian scripts documented in the chapter is seven; Makasar is one of them. Maybe get rid of the number, as several more scripts are to come? It’s also not quite clear why Makasar gets its own paragraph; the paragraph suggests that it belongs between Rejang and Buginese.
Date/Time: Tue Mar 2 13:41:48 CST 2021
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: Errors in the 13.0.0 Core Specification
Note: These errors have now been fixed in the draft text for version 14.0.
The text of the Unicode Standard contains six minor mistakes: “circumflext” (instead of “circumflex”), “fith century” (instead of “fifth century”), “Non_Joining_Group” (instead of “No_Joining_Group”), “manuscriptof” (instead of “manuscript of”), “Devangari” (instead of “Devanagari”) and “analoguous” (instead of “analogous”). Maybe you could correct this in the next version.
Date/Time: Wed Mar 3 13:11:06 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Ligatures in Old Hungarian
Note: Changes have been made in the draft text for version 14.0 in response to this report.
Chapter 13 says that Old Hungarian “often uses a large set of ligatures and consonant clusters.” Why mention consonant clusters? Ligatures may include all vowels, all consonants, or some of both. Is the intent that these ligatures be enabled in plain text by ZWJ? Is an uppercase ligature meant to be formed from all uppercase letters, or from one uppercase letter followed by lowercase letters? Or can it be either depending on the context? What, if anything, should <lowercase, ZWJ, uppercase> ligate to?
Date/Time: Fri Mar 12 19:45:54 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Bidi format characters do affect characters’ glyphs
Chapter 5 says “Bidirectional format characters do not affect the glyph forms of displayed characters”, but that is not true. The main point of that sentence (that bidi format characters have no glyphs) is still true, but it needs a better explanation. For example, U+0028 LEFT PARENTHESIS has different glyphs depending on the bidi level. In general, overriding a character’s directionality may have an arbitrary effect on its glyph form.
Date/Time: Fri Mar 12 19:56:45 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Unexpected variation sequences do affect display
Chapter 5 says “In other contexts, a format character may have no visible effect on display at all. [...] Another example is a variation selector following a base character for which no standardized or registered variation sequence exists. In that case, the variation selector has no effect on the display of the text.” However, that is an oversimplification. The presence of an unexpected variation selector may block another variation sequence, may block canonical reordering, and may block AMTRA reordering, all of which have effects on the display of the text.
Date/Time: Fri Mar 12 20:06:54 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Does <ZWJ, ZWJ> equal ZWJ?
UTS #51 defines various sequences with ZWJ, such as <1F415, 200D, 1F9BA>. How should they be rendered when there are multiple ZWJs, as in <1F415, 200D, 200D, 1F9BA>? According to chapter 5 of the core specification, “a sequence of two adjacent joiners, <..., ZWJ, ZWJ, ...>, is a case where the extra ZWJ should have no effect.” On the other hand, I get the impression that extraneous ZWJs go against the spirit of UTS #51. Is that sentence in the core specification meant to be taken literally? What effects should other default ignorable code points have within emoji?
Date/Time: Fri Mar 12 20:37:10 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: When does ZWJ act like <ZWJ, ZWNJ, ZWJ>?
Chapter 23 says that “between Arabic characters a ZWJ acts just like the sequence <ZWJ, ZWNJ, ZWJ>, preventing a ligature from forming instead of requesting the use of a ligature that would not normally be used.” What is an Arabic character, and which characters are relevant for the purpose of “between”? Consider the sequence <meem, ZWJ, U+17B4 KHMER VOWEL INHERENT AQ, jeem>. The ZWJ is between an Arabic character and a Khmer character. Is it right to conclude that the ZWJ therefore does not act just like <ZWJ, ZWNJ, ZWJ>, leaving it free to ligate the meem and jeem?
Date/Time: Mon Mar 29 23:44:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Confusion between nonspacing marks and nonspacing marks
The Unicode Standard has a general category Mn “nonspacing mark”. The Unicode Standard also has a definition D53: “Nonspacing mark: A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me).” This definition seems misguided for two reasons: ① Enclosing marks are almost always spacing, contradicting the statement that supports D53: “It generally does not consume space along the visual baseline in and of itself.” Adding an enclosure to a glyph requires space – otherwise it results in a smudge. Of the 25 font families I found on my Mac that contain U+20DD combining enclosing circle, only one monospaced font uses an enclosing circle glyph with the same width as any other glyph, predictably resulting in smudges. All 24 others use a glyph that’s large enough to accommodate the glyphs of most base characters with some padding, which means it’s substantially wider than most base glyphs. This is very different from the exceptional and context-dependent widening described for the real nonspacing mark U+0302 combining circumflex accent in “î”. ② Using the same term for two related but different concepts results in confusion. This is most obvious in an example for a regular expression character class in TUS appendix A Notational Conventions, page 941, which describes [\p{gc=Nonspacing_Mark}] as “nonspacing marks” – clearly correct based on the general category and clearly wrong based on definition D53. TUS section 5.12 Strategies for Handling Nonspacing Marks, page 217, claims “Properly speaking, a nonspacing mark is any combining character that does not add space along the writing direction.” and again “Composite character sequences can be rendered effectively by means of a fairly simple mechanism. In simple character rendering, a nonspacing combining mark has a zero advance width, and a composite character sequence will have the same width as the base character.” Both statements are incorrect for enclosing marks in most fonts. This leads to an inappropriate truncation strategy on page 219: “In simple systems, it is easiest to truncate by width, starting from the end and working backward by subtracting character widths as one goes. Because a trailing nonspacing mark does not contribute to the measurement of the string, the result will not separate nonspacing marks from their base characters.” Page 222 discusses letterspacing: “This process needs to be modified if zero-width nonspacing marks are present in the text. Otherwise, if extra justifying space is added after the base character, it can have the effect of visually separating the nonspacing mark from its base.” This issue would affect non-zero-width nonspacing marks as well, which D53 creates. And so on... I suggest changing D53 to define “nonspacing mark” based only on general category Mn, and discussing enclosing marks either together with nonspacing marks or separately, as appropriate in each context.
Date/Time: Tue Mar 30 00:11:37 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incomplete discussion of combining marks
The Unicode Standard has two sections with guidelines on nonspacing marks: 5.12 Strategies for Handling Nonspacing Marks and 5.13 Rendering Nonspacing Marks. The second paragraph of the first of these sections says: “In this section and the following section, the terms nonspacing mark and combining character are used interchangeably.” This sentence is confusing because the terms are not interchangeable at all: Combining characters, according to definition D52, include nonspacing (general category Mn), spacing (Mc), and enclosing (Me) marks. Even when applying the dubious definition D53, nonspacing marks do not include spacing marks. Most of the issues described in the two sections affect spacing and enclosing marks as well, so the sections are incomplete if they don’t cover them. The solutions, however, often need to be modified for them.
Date/Time: Tue Mar 30 00:15:01 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statement about grapheme clusters
The last paragraph of TUS section 2.11 Combining Characters contains this statement: “This core concept is known as a *grapheme cluster*, and it consists of any combining character sequence that contains only *nonspacing* combining marks or any sequence of characters that constitutes a Hangul syllable (possibly followed by one or more nonspacing marks).” This statement is incorrect. Both kinds of grapheme clusters defined in UAX 29, legacy grapheme clusters and extended grapheme clusters, can contain *spacing* combining marks.
Date/Time: Tue Mar 30 00:19:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statements about combining characters
The first paragraph of TUS section 2.11 Combining Characters has two incorrect statements: ① “Characters intended to be positioned relative to an associated base character are depicted in the character code charts above, below, or through a dotted circle.”: In reality, combining characters can be depicted on any side of a dotted circle, on multiple sides, crossing it, or enclosing it. ② “The Unicode Standard distinguishes two types of combining characters: spacing and nonspacing.” The standard, at least in its definition of general categories, distinguishes three types of combining characters: spacing, nonspacing, and enclosing, although definition D53 then adds ambiguity.
Date/Time: Fri Apr 2 19:05:22 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Unclear reference to “dashes” in TUS section 12.9 Malayalam
TUS section 12.9 Malayalam, page 512 says “... rendering engines should be prepared to handle Malayalam letters (including vowel letters), digits (both European and Malayalam), dashes, U+00A0 NO-BREAK SPACE and U+25CC DOTTED CIRCLE as base characters for the Malayalam vowel signs, U+0D4D MALAYALAM SIGN VIRAMA, U+0D02 MALAYALAM SIGN ANUSVARA, and U+0D03 MALAYALAM SIGN VISARGA. They should also be prepared to handle multiple combining marks on those bases.” It’s not clear which “dashes” this refers to. The Unicode Standard, in table 6-3 and in PropList.txt, defines two overlapping sets of dashes that together contain 30 dash characters. It is very unlikely that all of them are relevant to Malayalam, and OpenType in particular is not good at handling mixed-script clusters, such as a combination of U+1806 MONGOLIAN TODO SOFT HYPHEN with U+0D02 MALAYALAM SIGN ANUSVARA.
Date/Time: Fri Apr 2 18:21:40 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Dash definitions out of sync
The lists of dash characters in TUS table 6-3 and in PropList.txt are out of sync. Table 6-3 includes 007E TILDE, which is not listed as a Dash in PropList.txt. In turn, PropList.txt lists 2E1A HYPHEN WITH DIAERESIS, 2E3A..2E3B TWO-EM DASH..THREE-EM DASH, 2E40 DOUBLE HYPHEN, 10EAD YEZIDI HYPHENATION MARK, which are absent from TUS table 6-3. It’s not clear to me what qualifies 10EAD YEZIDI HYPHENATION MARK as a dash.