L2/21-068

Comments on Public Review Issues
(January 8, 2021 - April 22, 2021)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of (January 8, 2021 - April 22, 2021), since the previous cumulative document was issued prior to UTC #167 (April 27-29, 2021).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of April 22, 2021.

Issue Name Feedback Link
428 Unicode 14.0.0 Alpha Review (feedback)
427 Proposed Update UTS #18, Unicode Regular Expressions (feedback)
426 Proposed Update UTR #53, Unicode Arabic Mark Rendering (feedback) No new feedback for UTC #167
425 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) No feedback at this time
424 Proposed Update UAX #31 Unicode Identifier and Pattern Syntax (feedback) No feedback at this time
423 Proposed Update UTS #39 Unicode Security Mechanisms (feedback) No new feedback for UTC #167
422 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback) No feedback at this time
421 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback) No new feedback for UTC #167
420 Proposed Update UAX #45, U-source Ideographs (feedback)
419 Proposed Update UAX #44, Unicode Character Database (feedback) No new feedback for UTC #167
417 Proposed Update UAX #29, Unicode Text Segmentation (feedback)
416 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time
415 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time
408 QID Emoji (feedback)

The links below go to locations in this document for feedback.

Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to Properties & Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports

 


Feedback routed to Unihan ad hoc for evaluation


Date/Time: Mon Feb 8 09:16:42 CST 2021
Name: Jaemin Chung
Report Type: Other Question, Problem, or Feedback
Opt Subject: kMandarin values for some traditional characters

I suggest that these kMandarin values be added.

U+255FD	kMandarin	lán	# 𥗽; from U+2C497 𬒗
U+289C0	kMandarin	dù	# 𨧀; from U+2CB4A 𬭊
U+28A0F	kMandarin	bō	# 𨨏; from U+2CB5B 𬭛
U+28B4E	kMandarin	xǐ	# 𨭎; from U+2CB73 𬭳

Adding these would completely cover the traditional equivalents of the characters 
in kTGH (通用规范汉字表).

Date/Time: Tue Feb 16 01:26:03 CST 2021
Name: William He
Report Type: Error Report
Opt Subject: Minor kDefinition Error

The kDefinition for 穸 (U+7A78) appears incorrect. It says, "the gloom of the 
grave a tomb or grave; death" which may be missing a semicolon after the first 
instance of "grave". That said, "the gloom of the grave" is unclear regardless.

Date/Time: Mon Mar 22 14:18:10 CDT 2021
Name: Ryusei Yamaguchi
Report Type: Public Review Issue
Opt Subject: PRI #421 UNIHAN proposed update feedback

In the description of kPhonetic property, kPhonetic value of 
U+8753 is mistyped:

> An asterisk is appended when a character has the given phonetic class 
but is not explicitly included in the character list for that class. For 
example, 蝓 (U+8753) belongs to the class 1161 but is not explicitly listed 
in that class. Its kPhonetic value is therefore "1161*".

Correct kPhonetic value of U+8753 is "1611*".

Date/Time: Wed Apr 14 06:17:42 CDT 2021
Name: Štěpán Zídek
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: KP0-E5A9 mapping

Mr. Jaemin Jung proposed to change the mapping of KP0-E5A9 to U+67FF (柿)
instead of current U+676E (杮) in document L2/21-059. KP0-E5A9 should be
mapped to U+67BE (枾, read as 시 too) rather than to U+67FF (柿). This mapping
would be more accurate, since the character '枾', coded as E5A9, is used in
SamHung 3.0 multilingual dictionary, which originates from North Korea and
uses KPS9566 coding. I can provide font bitmaps from SamHung 3.0 to support
my claim.

Feedback routed to Script ad hoc for evaluation

Date/Time: Tue Jan 19 19:39:21 CST 2021
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: On the Kawi space filler and the names of punctuation characters


This is a response to https://www.unicode.org/L2/L2020/20284r-kawi.pdf 

I would like to point out that the character PUNCTUATION SPACE FILLER has an 
identical character to the DIGIT FOUR. Considering that the letter RO was 
unified with the DIGIT TWO for the same reason, I recommend to remove the 
SPACE FILLER and annotate the DIGIT FOUR with its function. This argument 
isn't valid if a consistent glyph difference is attested between them.

Furthermore I also recommend some other names for other punctuation characters:

  KAWI PUNCTUATION ALTERNATE SECTION MARK -> KAWI PUNCTUATION SECTION MARK WITH REPHA
  KAWI PUNCTUATION FILLED CIRCLE -> KAWI PUNCTUATION CIRCLE WITH DOT
  KAWI PUNCTUATION CLOSING SPIRAL -> KAWI PUNCTUATION SPIRAL WITH WAVY TAIL

I also recommend annotating the SPIRAL character with the alias "siddham"

Date/Time: Tue Feb 23 20:12:22 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Kannada <ra, ZWNJ, virama, consonant>

Section 5.21 says that “a format character may have no visible effect on
display at all”, with the example of <x, ZWJ, x>. There is a case
where it is not clear whether a format character is supposed to have a
visible effect. In Kannada, how should <ra, ZWNJ, virama, consonant>
be rendered? Chapter 12 “Kannada” does not define what ZWNJ does in that
context.

One interpretation is that, since that use of ZWNJ is not defined, it is
ignored, i.e. the sequence is rendered the same as <ra, virama,
consonant>.

Another interpretation is that the sequence should be rendered the same as
<ra, ZWJ, virama, consonant>. In Indic scripts where <ZWNJ,
virama> is defined, it generally has the effect of blocking special
behaviors, such as this special initial form of ra, and inducing subjoined
C2 forms.

So which is it? See https://github.com/harfbuzz/harfbuzz/issues/2018 for more information.

Date/Time: Tue Feb 23 20:40:37 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Edge case for ZWJ and ZWNJ in Malayalam

The general rule for rendering ZWNJ and ZWJ when they appear unexpectedly is
to ignore them. That is, the string should be rendered exactly as if the
unexpected join controls weren’t there. What happens if <ZWJ, ZWNJ> or
<ZWNJ, ZWJ> appears in a position where either ZWNJ or ZWJ would be
expected, but not both?

Specifically, in Malayalam, how are <consonant, ZWJ, ZWNJ, virama,
consonant> and <consonant, ZWNJ, ZWJ, virama, consonant> rendered?
(See table 12-38.)

Date/Time: Thu Feb 25 00:46:04 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect Indic Syllabic Category for Myanmar Sign Asat

U+103A MYANMAR SIGN ASAT currently has Indic_Syllabic_Category=Pure_Killer.
This seems incorrect. As the Unicode Standard, section 16.3, describes, this
character is used as part of the three-character sequence used to encode the
kinzi, a repha-like conjunct form.

It seems Indic_Syllabic_Category=Virama would be more appropriate. The
situation is similar to U+0BCD TAMIL SIGN VIRAMA, which also doesn’t
participate in conjunct formation, except when it does.

Date/Time: Fri Mar 5 20:39:31 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Representing hamza in lam–alef ligature

Chapter 9, section “Arabic”, subsection “Quranic Texts” says that “words
spelled with the medial form of U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE in
modern Arabic orthographies may appear in Quranic texts without the tooth
typical of the letter. There is usually an elongation under the hamza, and
the hamza may carry other diacritical marks, such as a fatha. This
convention can be thought of as a modified version of yeh-hamza, and is
represented with the sequence <U+0640 ARABIC TATWEEL, U+0654 ARABIC HAMZA
ABOVE>.” There is another case of a carrier-less hamza: between a lam and
an alef in a lam–alef ligature. How should such a hamza be encoded?

In https://github.com/googlefonts/noto-fonts/issues/2017, Roozbeh says the
recommended sequence is <lam, tatweel, hamza above, alef>. If this is
Unicode’s recommendation, it should be made explicit in the standard. The
current wording, describing tatweel graphically as like a toothless, dotless
yeh, does not apply to any graphical component of a lam–alef ligature, so
the subsection might be interpreted as saying nothing about hamzas in
lam–alef ligatures.

Feedback routed to Properties & Algorithms ad hoc for evaluation

Date/Time: Mon Feb 1 10:58:28 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Contradictory requirements for U+2044 and default ignorable code points

Chapter 6 says that the fraction slash creates fractions only in the
environment `\p{Nd}+\u2044\p{Nd}+`. However, chapter 5 says that default
ignorable code points should sometimes be ignored for display, with the
example that “U+200B ZERO WIDTH SPACE affects word segmentation, but has no
visible display”, and chapter 23 says that outside of a defined variation
sequence, “use of a variation selector character does not change the visual
appearance of the preceding base character from what it would have had in
the absence of the variation selector.” How should these contradictory
requirements be resolved? For example, should <digit, variation selector,
slash, digit> and <digit, ZWSP, slash, digit> be displayed as
fractions or not?

Date/Time: Sat Apr 24 13:03:04 CDT 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Response to L2/21-069

> David does not include a use case for combinations of fractions with default 
> ignorable code points in his submission.

The use case is the slashed zero in fractions. L2/21-069’s recommendation in F1 
implies that <zero, VS1, fraction slash, one> should be rendered as 
<full-sized slashed zero, slash, full-sized one>, but that <one, fraction 
slash, zero, VS1> may be rendered <numerator one, fraction slash, denominator 
slashed zero>.

Feedback routed to Emoji SC for evaluation

Date/Time: Wed Jan 13 10:32:04 CST 2021
Name: William Overington
Report Type: Other Question, Problem, or Feedback
Opt Subject: Abstract emoji

Could Unicode, Inc. please consider allowing abstract emoji to become 
in scope for being encoded in regular Unicode? Abstract emoji could be 
very helpful for communicating through the language barrier.
 
I have recently published a colour font for sixteen abstract emoji for 
personal pronouns and it would be helpful if abstract emoji were to 
become in scope for The Unicode Standard.
 
http://www.users.globalnet.co.uk/~ngo/mariposa_novel.htm 
 
William Overington
 
Tuesday 12 January 2021

Date/Time: Sun Feb 14 11:23:49 CST 2021
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/20-064: Glyph for U+1F979 FACE HOLDING BACK TEARS

The glyphic appearance of proposed character U+1F979 FACE HOLDING BACK TEARS
seems underspecified. The original proposal (L2/20-064) uses a glyph with a
smiling mouth as its main artwork, but throughout the document a different
glyph that looks much more distraught and emotionally unstable is used to
illustrate usage examples. In particular, the screencaps of cartoon
characters in section C (“Image distinctiveness”) all depict faces that are
distinctly not smiling. The emoji candidates page
(https://www.unicode.org/emoji/future/emoji-candidates.html) uses the
distraught glyph, while the draft code chart for the Supplemental Symbols
and Pictographs block shows the smiling variant.

While both variants can be said to be “holding back tears”, the UTC should
investigate whether such a wide range of possible glyphic interpretations
could lead to communication issues between end users. The keywords
associated with the emoji such as “angry” and “sad” would certainly suggest
that the smiling variant is somewhat inappropriate, while the distraught
variant is (at least in my opinion) not inherently unsuited for representing
emotions such as being proud of another person.

Feedback routed to Editorial Committee for evaluation

Date/Time: Fri Jan 15 06:59:44 CST 2021
Name: Charlotte Buff
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/19-053: Duplicate Character Name (Znamenny)

This report was reviewed already and the name duplication has been fixed. Please see document L2/21-013, Section F2.

While working on a document concerning Znamenny notation, I discovered an unrelated 
flaw in the original proposal (L2/19-053): The proposed characters U+1CF2D and U+1CF40 
were both given the exact same name – ZNAMENNY COMBINING MARK KRYZH.

Date/Time: Sun Feb 7 00:06:15 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: NamesList documentation refers to “LIGHT SCREEN”

This has already been fixed in draft data files for version Unicode 14.0.

The documentation for the Unicode names list file format at
http://ftp.unicode.org/Public/UNIDATA/NamesList.html 
(revision 13.0.0) refers to a “glyph for LIGHT SCREEN”, to be used 
instead of an unavailable variant glyph.

There’s no explanation what “LIGHT SCREEN” refers to. From the context it 
appears that it might refer to a Unicode character, but Unicode 13 doesn’t 
include such a character.

Date/Time: Tue Feb 23 12:03:14 CST 2021
Name: Jungshik Shin
Report Type: Error Report
Opt Subject: Hangul collation and Hangul tone marks

Note: Changes have been made in the draft text for version 14.0 in response to [the first part of] this report.

Hello, 

I'm writing to give my feedback on TUC 13 section 18.6 Hangul. 

On pages 746-747, I found the following regarding the collation of Hangul
syllables:

"Because the order of the syllables in the Hangul Syllables block reflects
the preferred ordering, sequences of Hangul syllables for modern Korean may
be collated with a simple binary comparison"

Although the above is certainly the case of South Korean collation order
since 1988 [1], it does not hold true for North Korean sorting rules.
Therefore, the locale data for ko-KP needs to be tailored for the Hangul
collation. 

In addition, the section 18.6 does not mention two Hangul tone marks, U+302E
and U+302F. To faithfully represent the old Korean text, Hangul tone marks
are required and should be mentioned along with Hangul Conjoining Jamos. 

It'd be great if the two points above could be reflected in TUS 14 or later.

Thank you for your consideration, 

Jungshik Shin 


[1] Before 1988, there were a couple of 'competing' collation orders even in
South Korea and different dictionaries used different sorting rules. It was
only in 1988 that the South Korean orthographic standard explicitly
specified how to sort Hangul. 

Date/Time: Tue Feb 23 19:56:30 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: U+034F COMBINING GRAPHEME JOINER is not always ignored for display

Section 5.21 says “U+034F COMBINING GRAPHEME JOINER is likewise always
ignored for display.” This is not true: it has no visible glyph of its own,
but it may have a visible effect on other glyphs. For example, see Figure
7-11 and UTR #53. As section 5.21 says earlier on the same page, “In such
cases, even though the format character or variation selector has no visible
glyph of its own, it would be inappropriate to say that it is ignored for
display, because the intent of its use is to change the display in some
visible way.”

Date/Time: Wed Feb 24 23:36:32 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Typo: Meetei Mayak Extensions

This has already been updated in the draft for the next version.

Section 13.7 of the Unicode Standard, and the corresponding entry in the
table of contents, repeatedly refer to a “Meetei Mayak Extensions” block.

The correct name of the block is “Meetei Mayek Extensions”.

Date/Time: Fri Feb 26 03:19:19 CST 2021
Name: huang xin
Report Type: Error Report
Opt Subject: What is the exact definition of assigned character?

The term assigned character seems to have conflict means in the Unicode 
Standard Version 13.0.

Quoted from chapter 2.1:
    "In contrast, a character encoding standard provides a single set of 
     fundamental units of encoding, to which it uniquely assigns numerical 
     code points. These units, called assigned characters, are the smallest 
     interpretable units of stored text."

This suggests that the "units" are called "assigned characters", and "numerical 
code points" are assigned to "assigned characters".

Quoted from chapter 3.5 D49:
    "Private-use code points are considered to be assigned characters"

This suggests that assigned character is a kind of code point.

So there is conflict between the two quotes, if assigned character is some 
kind of code point, how can "numerical code point" be assigned to some kind of code point?

Date/Time: Sat Feb 27 21:03:22 CST 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Chapter 17 intro miscounts Indonesian scripts

The introduction to chapter 17 in TUS 13.0 says "Indonesia has many local, 
traditional scripts, most of which are ultimately derived from Brahmi. 
Six of these scripts are documented in this chapter."

The actual number of Indonesian scripts documented in the chapter is seven; 
Makasar is one of them. Maybe get rid of the number, as several more 
scripts are to come?

It’s also not quite clear why Makasar gets its own paragraph; the 
paragraph suggests that it belongs between Rejang and Buginese.

Date/Time: Tue Mar 2 13:41:48 CST 2021
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: Errors in the 13.0.0 Core Specification

Note: These errors have now been fixed in the draft text for version 14.0.

The text of the Unicode Standard contains six minor mistakes: “circumflext” 
(instead of “circumflex”), “fith century” (instead of “fifth century”), 
“Non_Joining_Group” (instead of “No_Joining_Group”), “manuscriptof” (instead 
of “manuscript of”), “Devangari” (instead of “Devanagari”) and “analoguous” 
(instead of “analogous”). Maybe you could correct this in the next version.

Date/Time: Wed Mar 3 13:11:06 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Ligatures in Old Hungarian

Note: Changes have been made in the draft text for version 14.0 in response to this report.

Chapter 13 says that Old Hungarian “often uses a large set of ligatures and
consonant clusters.” Why mention consonant clusters? Ligatures may include
all vowels, all consonants, or some of both.

Is the intent that these ligatures be enabled in plain text by ZWJ?

Is an uppercase ligature meant to be formed from all uppercase letters, or
from one uppercase letter followed by lowercase letters? Or can it be either
depending on the context? What, if anything, should <lowercase, ZWJ,
uppercase> ligate to?

Date/Time: Fri Mar 12 19:45:54 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Bidi format characters do affect characters’ glyphs

Chapter 5 says “Bidirectional format characters do not affect the glyph
forms of displayed characters”, but that is not true. The main point of that
sentence (that bidi format characters have no glyphs) is still true, but it
needs a better explanation. For example, U+0028 LEFT PARENTHESIS has
different glyphs depending on the bidi level. In general, overriding a
character’s directionality may have an arbitrary effect on its glyph form.

Date/Time: Fri Mar 12 19:56:45 CST 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Unexpected variation sequences do affect display

Chapter 5 says “In other contexts, a format character may have no visible
effect on display at all. [...] Another example is a variation selector
following a base character for which no standardized or registered variation
sequence exists. In that case, the variation selector has no effect on the
display of the text.” However, that is an oversimplification. The presence
of an unexpected variation selector may block another variation sequence,
may block canonical reordering, and may block AMTRA reordering, all of which
have effects on the display of the text.

Date/Time: Fri Mar 12 20:06:54 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: Does <ZWJ, ZWJ> equal ZWJ?

UTS #51 defines various sequences with ZWJ, such as <1F415, 200D,
1F9BA>. How should they be rendered when there are multiple ZWJs, as in
<1F415, 200D, 200D, 1F9BA>? According to chapter 5 of the core
specification, “a sequence of two adjacent joiners, <..., ZWJ, ZWJ,
...>, is a case where the extra ZWJ should have no effect.” On the other
hand, I get the impression that extraneous ZWJs go against the spirit of UTS
#51. Is that sentence in the core specification meant to be taken literally?
What effects should other default ignorable code points have within emoji?

Date/Time: Fri Mar 12 20:37:10 CST 2021
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: When does ZWJ act like <ZWJ, ZWNJ, ZWJ>?

Chapter 23 says that “between Arabic characters a ZWJ acts just like the
sequence <ZWJ, ZWNJ, ZWJ>, preventing a ligature from forming instead
of requesting the use of a ligature that would not normally be used.” What
is an Arabic character, and which characters are relevant for the purpose of
“between”? Consider the sequence <meem, ZWJ, U+17B4 KHMER VOWEL INHERENT
AQ, jeem>. The ZWJ is between an Arabic character and a Khmer character.
Is it right to conclude that the ZWJ therefore does not act just like
<ZWJ, ZWNJ, ZWJ>, leaving it free to ligate the meem and jeem?

Date/Time: Mon Mar 29 23:44:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Confusion between nonspacing marks and nonspacing marks

The Unicode Standard has a general category Mn “nonspacing mark”. The
Unicode Standard also has a definition D53: “Nonspacing mark: A combining
character with the General Category of Nonspacing Mark (Mn) or Enclosing
Mark (Me).”

This definition seems misguided for two reasons:

① Enclosing marks are almost always spacing, contradicting the statement
that supports D53: “It generally does not consume space along the visual
baseline in and of itself.” Adding an enclosure to a glyph requires space –
otherwise it results in a smudge. Of the 25 font families I found on my Mac
that contain U+20DD combining enclosing circle, only one monospaced font
uses an enclosing circle glyph with the same width as any other glyph,
predictably resulting in smudges. All 24 others use a glyph that’s large
enough to accommodate the glyphs of most base characters with some padding,
which means it’s substantially wider than most base glyphs. This is very
different from the exceptional and context-dependent widening described for
the real nonspacing mark U+0302 combining circumflex accent in “î”.

② Using the same term for two related but different concepts results in
confusion. This is most obvious in an example for a regular expression
character class in TUS appendix A Notational Conventions, page 941, which
describes [\p{gc=Nonspacing_Mark}] as “nonspacing marks” – clearly correct
based on the general category and clearly wrong based on definition D53. TUS
section 5.12 Strategies for Handling Nonspacing Marks, page 217, claims
“Properly speaking, a nonspacing mark is any combining character that does
not add space along the writing direction.” and again “Composite character
sequences can be rendered effectively by means of a fairly simple mechanism.
In simple character rendering, a nonspacing combining mark has a zero
advance width, and a composite character sequence will have the same width
as the base character.” Both statements are incorrect for enclosing marks in
most fonts. This leads to an inappropriate truncation strategy on page 219:
“In simple systems, it is easiest to truncate by width, starting from the
end and working backward by subtracting character widths as one goes.
Because a trailing nonspacing mark does not contribute to the measurement of
the string, the result will not separate nonspacing marks from their base
characters.” Page 222 discusses letterspacing: “This process needs to be
modified if zero-width nonspacing marks are present in the text. Otherwise,
if extra justifying space is added after the base character, it can have the
effect of visually separating the nonspacing mark from its base.” This issue
would affect non-zero-width nonspacing marks as well, which D53 creates. And
so on...

I suggest changing D53 to define “nonspacing mark” based only on general
category Mn, and discussing enclosing marks either together with nonspacing
marks or separately, as appropriate in each context.

Date/Time: Tue Mar 30 00:11:37 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incomplete discussion of combining marks

The Unicode Standard has two sections with guidelines on nonspacing marks:
5.12 Strategies for Handling Nonspacing Marks and 5.13 Rendering Nonspacing
Marks.

The second paragraph of the first of these sections says: “In this section
and the following section, the terms nonspacing mark and combining character
are used interchangeably.”

This sentence is confusing because the terms are not interchangeable at all:
Combining characters, according to definition D52, include nonspacing
(general category Mn), spacing (Mc), and enclosing (Me) marks. Even when
applying the dubious definition D53, nonspacing marks do not include spacing
marks.

Most of the issues described in the two sections affect spacing and
enclosing marks as well, so the sections are incomplete if they don’t cover
them. The solutions, however, often need to be modified for them.

Date/Time: Tue Mar 30 00:15:01 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statement about grapheme clusters

The last paragraph of TUS section 2.11 Combining Characters contains this
statement: “This core concept is known as a *grapheme cluster*, and it
consists of any combining character sequence that contains only *nonspacing*
combining marks or any sequence of characters that constitutes a Hangul
syllable (possibly followed by one or more nonspacing marks).”

This statement is incorrect. Both kinds of grapheme clusters defined in UAX
29, legacy grapheme clusters and extended grapheme clusters, can contain
*spacing* combining marks.

Date/Time: Tue Mar 30 00:19:43 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Incorrect statements about combining characters

The first paragraph of TUS section 2.11 Combining Characters has two
incorrect statements:

① “Characters intended to be positioned relative to an associated base
character are depicted in the character code charts above, below, or through
a dotted circle.”: In reality, combining characters can be depicted on any
side of a dotted circle, on multiple sides, crossing it, or enclosing it.

② “The Unicode Standard distinguishes two types of combining characters:
spacing and nonspacing.” The standard, at least in its definition of general
categories, distinguishes three types of combining characters: spacing,
nonspacing, and enclosing, although definition D53 then adds ambiguity.

Date/Time: Fri Apr 2 19:05:22 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Unclear reference to “dashes” in TUS section 12.9 Malayalam

TUS section 12.9 Malayalam, page 512 says “... rendering engines should be
prepared to handle Malayalam letters (including vowel letters), digits (both
European and Malayalam), dashes, U+00A0 NO-BREAK SPACE and U+25CC DOTTED
CIRCLE as base characters for the Malayalam vowel signs, U+0D4D MALAYALAM
SIGN VIRAMA, U+0D02 MALAYALAM SIGN ANUSVARA, and U+0D03 MALAYALAM SIGN
VISARGA. They should also be prepared to handle multiple combining marks on
those bases.”

It’s not clear which “dashes” this refers to. The Unicode Standard, in table
6-3 and in PropList.txt, defines two overlapping sets of dashes that
together contain 30 dash characters. It is very unlikely that all of them
are relevant to Malayalam, and OpenType in particular is not good at
handling mixed-script clusters, such as a combination of U+1806 MONGOLIAN
TODO SOFT HYPHEN with U+0D02 MALAYALAM SIGN ANUSVARA.

Date/Time: Fri Apr 2 18:21:40 CDT 2021
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Dash definitions out of sync

The lists of dash characters in TUS table 6-3 and in PropList.txt are out of sync. 
Table 6-3 includes 007E TILDE, which is not listed as a Dash in PropList.txt. 
In turn, PropList.txt lists 2E1A HYPHEN WITH DIAERESIS, 2E3A..2E3B 
TWO-EM DASH..THREE-EM DASH, 2E40 DOUBLE HYPHEN, 10EAD YEZIDI HYPHENATION MARK, 
which are absent from TUS table 6-3.

It’s not clear to me what qualifies 10EAD YEZIDI HYPHENATION MARK as a dash.

Other Reports