Comments on Public Review Issues

L2/20-239

Comments on Public Review Issues
(July 20 - September 23, 2020)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of September 23, 2020, since the previous cumulative document was issued prior to UTC #164 (July 2020).

422 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback) No feedback at this time

421 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)

420 Proposed Update UAX #45, U-source Ideographs (feedback) No feedback at this time

419 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time

417 Proposed Update UAX #29, Unicode Text Segmentation (feedback) No feedback at this time

416 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time

415 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time

408 QID Emoji (feedback) Last feedback June 4, 2020

The links below go to locations in this document for feedback.

Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to ucd-dev ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports

Feedback routed to Unihan ad hoc for evaluation

Date/Time: Fri Jul 17 20:18:55 CDT 2020 (updated 2020-09-01)
Name: Jim Breen
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Additional Unihan information for U+FA11 U+37E2 U+2550E

I would like to propose some additional Unihan information for several
related characters. I am basing this submission on the entries in the
2002 edition of Shibano's JIS 漢字字典. That dictionary covers the kanji
in JIS X 0208 and JIS X 0213. As you are probably aware, Shibano
chaired the JSC committee that revised JIS X 0208 and developed JIS X
0213. The characters are:

 U+FA11 﨑 (p. 174)
 U+37E2 㟢 (p. 173)
 U+2550E 𥔎 (p. 457)

The kIRG_JSource for all three is JIS X 0213.

These characters are stated in Shibano to be variants of 崎 (U+5D0E), 碕
(U+7895), 嵜 (U+5D5C) and 埼 (U+57FC). U+5D0E is a Jinmeiyo Kanji
(2010).

The additions I propose are:

U+FA11 﨑
kJapaneseKun  SAKI
kJapaneseOn   KI
kDefinition   cape; spit; promontory

U+37E2 㟢
kJapaneseKun  SAKI
kDefinition   cape; spit; promontory

U+2550E 𥔎
kJapaneseKun  SAKI
kDefinition   cape; spit; promontory

The readings are all drawn from Shibano. The kDefinition values are
from those associated with the related characters in Japanese sources.
------
This is in addition to the following changes proposed by Ken and others:

U+37E2    kSemanticVariant    U+57FC U+5D0E<kMorohashi:TZ U+5D5C
U+7895 U+966D U+FA11 U+2550E
U+57FC    kJapaneseKun    SAI SAKI
U+57FC    kSemanticVariant    U+37E2 U+5D0E<kMorohashi U+5D5C
U+7895<kMorohashi:T U+966D U+FA11 U+2550E
U+5D0E    kSemanticVariant    U+37E2<kMorohashi:T U+57FC<kMorohashi
U+5D5C<kMorohashi U+7895<kMorohashi U+966D<kMorohashi
U+FA11<kMorohashi U+2550E
U+5D5C    kSemanticVariant    U+37E2 U+57FC U+5D0E<kMorohashi:Z U+7895
U+966D U+FA11 U+2550E
U+7895    kSemanticVariant    U+37E2 U+57FC<kMorohashi:TZ
U+5D0E<kMorohashi U+5D5C U+966D U+FA11 U+2550E
U+966D    kSemanticVariant    U+37E2 U+57FC U+5D0E<kMorohashi U+5D5C
U+7895 U+FA11 U+2550E
U+FA11    kSemanticVariant    U+37E2 U+57FC U+5D0E<kMorohashi:Z U+5D5C
U+7895 U+966D U+2550E
U+2550E    kSemanticVariant    U+37E2 U+57FC U+5D0E U+5D5C U+7895 U+966D U+FA11

Date/Time: Wed Aug 5 11:37:47 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: Errors in the Unihan Database

(1)
U+4CA4 kTotalStrokes 21
↓
U+4CA4 kTotalStrokes 18

(2)
U+9FD2 kSimplifiedVariant U+9FD3
U+9FD3 kTraditionalVariant U+9FD2
↓
U+9FD2 kTraditionalVariant U+9FD3
U+9FD3 kSimplifiedVariant U+9FD2

Date/Time: Mon Aug 31 08:29:11 CDT 2020
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Unihan-related feedback

Please consider the following three pieces of Unihan-related feedback:

1) Change 釒 (U+91D2) to 金 (U+91D1) in the IDSes for the following eight U-Source ideographs:

UTC-00102;C;U+2B4B6;167.9;1316.111;⿰釒凾;kMatthews 2051;
UTC-00207;X;;167.10;1318.281;⿰釒冤;kSBGY 115.19;
UTC-00432;X;;167.11;1321.071;⿰釒患;kMeyerWempe 3708b;
UTC-00872;D;U+2B7F0;167.6;1305.211;⿰釒当;Adobe-Japan1 20240;
UTC-00889;N;;167.10;1318.281;⿰釒袓;Adobe-CNS1 C+16257;
UK-02711;G;U+30F25;167.5;1303.101;⿰釒卢;UTCDoc L2/15-260 1399;
UK-02829;UK-2015;UTC-02828;167.7;1308.261;⿰釒囱;UTCDoc L2/15-260 1517;
UK-02895;G;U+30F23;167.4;1299.191;⿰釒㝉;UTCDoc L2/15-260 1583;

Rationale: 釒 (U+91D2) appears only once in the IDS database, as itself. 金 (U+91D1) is 
used as a component in over 2,000 ideographs. Also, the IDS database already includes 
these adjustments for those that are encoded.

2) Simplify the IDS for UTC-00892 (U+2DF3C 𭼼) as follows:

Current:
UTC-00892;F;U+2DF3C;104.23;0783.271;⿸疒⿲彳⿳山一黑攵;Adobe-CNS1 C+16303;

Proposed:
UTC-00892;F;U+2DF3C;104.23;0783.271;⿸疒黴;Adobe-CNS1 C+16303;

Rationale: The IDS database already specifies ⿸疒黴 as the IDS for U+2DF3C 𭼼 (UTC-00892).

3) Horizontally-extend U+289B1 𨦱 (Extension B) to add UK-02829 ⿰金囱 as a source reference. 
Its simplified form, U+30F8A 𰾊 (UK-2828), is in Extension G, which further means that the 
kSimplifiedVariant and kTraditionalVariant properties can be added to these ideographs as follows:

U+289B1 kSimplifiedVariant U+30F8A
U+30F8A kTraditionalVariant U+289B1

That is all.

Date/Time: Thu Sep 3 12:30:38 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: U+28E0F kTotalStrokes error

The kTotalStrokes value for U+28E0F 𨸏 should be 8, not 2.

Date/Time: Thu Sep 3 12:34:42 CDT 2020
Name: Jaemin Chung
Report Type: Other Question, Problem, or Feedback
Opt Subject: Request for addition of one cross-reference under U+2EA7

Under U+2EA7 ⺧, a cross-reference to U+20092 𠂒 needs to be added. 
When I was writing L2/19-214R, I was not aware of U+20092.

Date/Time: Thu Sep 17 18:20:16 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: kTotalStrokes value for U+2B413

The kTotalStrokes value for U+2B413 𫐓 should be 13, not 10.

Date/Time: Thu Sep 17 18:44:18 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: kCantonese value for U+2B413

http://unicode.org/L2/L2020/20231-2B413-2B5E6-change.pdf 
In addition to what I wrote in L2/20-231, the kCantonese value for 
U+2B413 𫐓 should be changed to jau4 (which is the kCantonese value 
for U+8F2E 輮). This has to be changed anyway, and I think copying 
the value from the traditional counterpart is fine in this case.

Feedback routed to Script ad hoc for evaluation

Date/Time: Sun Aug 16 06:02:29 CDT 2020
Name: Moemen Metwally
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Eastern Arabic Fractions

Good Morning, 

I've looked thoroughly at the Arabic unicode ranges, and despite what seems
to be an obsession with Islamic religious rhetoric & a very specific
Qur'anic orthography, there are some serious basic oversights. I hope I'm
mistaken and you can lead me to the range where I find them!

The Eastern half of the Arab world (the Mashreq) uses the following numerals
۱۲۳٤٥٦٧۸۹۰ - and although unicode includes the symbols for cube-root and
fourth-root, as well as certain mathematical symbols like the one for
diameter, I cannot find:

- Symbols for half, a third, a quarter, three-quarters... the vulgar
fractions we commonly see in handwriting, print, manuscripts, etc. They
are widely used.

- A unicode symbol for the 'egyptian' two. KFGQPC Uthman Taha is the only
font which finds a workaround for this, compare it to any other arab font
and you'll see how the two has no 'tooth', as it is written in Egypt,
whereas the current unicode standard uses the tooth.

There's a bit more to say but let's start with those please!

Date/Time: Thu Sep 3 16:18:05 CDT 2020
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Suggestion to supplement L2/20-209 with named character sequences

The proposal to add the characters needed for the kana Hokkien/Minnan
orthography is in my opinion well formed and should not be considered
"preliminary".

That being said, I would suggest also proposing a set of named character
sequences for the letters with combining marks, if and only if they have
formal names. This is already the case for other kana letters with combining
marks, and so it would make it all consistent.

Date/Time: Sat Sep 5 10:34:31 CDT 2020
Name: Ken Lunde
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on feedback on L2/20-209


With regard to Eduardo Marín Silva's 2020-09-03 feedback on L2/20-209, I
disagree with the proposal to add named character sequences. They are not
necessary. The existing named character sequences for kana exist only
because those combining sequences correspond to atomic characters in the JIS
X 0213 standard, which is explicitly mentioned in NamedSequences.txt. Named
character sequences for additional kana were once proposed in L2/16-133, but
the UTC rejected them during UTC #147 for this reason:

https://www.unicode.org/L2/L2016/16133-japanese-voiced-vowels.pdf 

The first paragraph of Section 1.1 of UAX #34, Unicode Named Character
Sequences, captures this nicely:

In some limited circumstances it is necessary to also provide a name for
such sequences. The primary example is the need to have an identifier for a
sequence to correlate with an identifier in another standard, for which a
cross-mapping to Unicode is desired. To address this need, the Unicode
Standard defines a mechanism for naming sequences and provides a short list
of sequences that have been formally named. This list is deliberately
selective: it is neither possible nor desirable to attempt to provide names
for all possible sequences of Unicode characters that could be of  interest.

Regards...

-- Ken

Feedback routed to ucd-dev ad hoc for evaluation

Date/Time: Thu Jul 30 16:55:11 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31

This feedback pertains to revision 33 of UAX#31:
http://www.unicode.org/reports/tr31/tr31-33.html 

In section 1, the paragraph after Figure 1 says, 

"The set consisting of the union of ID_Start and ID_Nonstart 
characters is known as Identifier Characters ..."

Then in section 1.1, the second bulleted item in the list of stability guarantees says,

"The Identifier characters are always a superset of the ID_Start characters."

Given the definition of "Identifier Characters" given in 
section 1, this statement is tautological—necessarily true, 
by definition—so not useful to state as a stability guarantee. 
Was "proper superset" meant?

Date/Time: Thu Jul 30 17:23:29 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31

This feedback pertains to revision 33 of UAX#31

In section 2, in the 4th paragraph, the last sentence says,

"The second column provides a general description of the coverage for the
associated class, the derivational relationship between the ID properties
and the XID properties, and an associated set notation for the class."

The concepts "ID property" and "XID property" are in this way introduced. If
there were mention of only "ID property", that would be fine: in the
context, it would be sufficiently clear that there will be character
properties pertaining to IDs that are used for Default Identifier Syntax.
However, with a second concept thrown in, "XID property", this becomes
confusing. (Huh? What's an "XID property" and what does it have to do with
identifier syntax?) It would help to introduce the pair of terms with some
explanation of what "XID" is all about.

Date/Time: Thu Jul 30 18:17:04 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31, 2.3.1 Limitations

This feedback pertains to revision 33 of UAX#31:
http://www.unicode.org/reports/tr31/tr31-33.html 

Section 2.3.1 discusses potential tightening of restrictions in regard to
A1, A2 or B (use of ZWNJ or ZWJ within IDs in certain contexts). The last
paragraph says the following:

"Comparison. Typically the identifiers with and without these 
characters should compare as equivalent, to prevent security issues."

Examples given in the preceding descriptions of A1, A2 and B included cases
in which strings with or without the joiner were both linguistically valid;
e.g., Farsi words for "names" and "a letter". But a constraint on comparison
is, in effect, preventing a distinction from being made: strings with and
without the joiner are to be treated as the same ID.

That seems to amount to saying that the joiners should only be kept when
displaying IDs as typed by a user. In that case, it seems like this
paragraph in 2.3.1 should suggest that. 

In addition, it seems like it would make sense for 2.3.1 to mention layout
and format control characters, when permitted in IDs, as a potential basis
for distinguishing between display format and comparison format.

Date/Time: Thu Jul 30 18:43:45 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31, set notation

This feedback pertains to revision 33 of UAX#31:

In section 2, Table 2 includes descriptions of property values in terms of
"set notation". This is introduced in the immediately-preceding paragraph:

"The second column provides ... an associated set notation for the class."

An example, the notation used for describing ID_Start:

"[\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]"

No explanation is provided for this notation. It might make sense to someone
already familiar with Unicode and the notation from other contexts. For
someone coming from, say, a mathematics background but without Unicode
experience, this does not like any familar set notation. (Math convention is
to use brace brackets to denote a set; that's also used in, e.g., Python.)
There are many classes of readers that would get to this point in the doc
and wonder where the notation is explained.

The doc continues with other use of the notation, without explanation. E.g., section 2.3, under A.1:

"This corresponds to the following regular expression (in Perl-style syntax): /$LJ $T* ZWNJ $T* $RJ/
where:

$T = \p{Joining_Type=Transparent}
$RJ = [\p{Joining_Type=Dual_Joining}\p{Joining_Type=Right_Joining}]
$LJ = [\p{Joining_Type=Dual_Joining}\p{Joining_Type=Left_Joining}]"

The first hint—if the reader recognizes it as such, is a mention in section
2.4, after Table 3b, of "UnicodeSet syntax".

"In UnicodeSet syntax, the characters in these tables are:

Table 3: [\$_]
Table 3a: ['\-.\:·֊״་‌‐’‧゠・]
Table 3b: [\u200D ׳]"

This appears to be the same notation, but referred to in a different way:
"UnicodeSet syntax" (versus "set notation" earlier—same notation? Or
different?).

This appears to be using the "UnicodeSet notation" specified in section 5.3.3 of UTS#35

http://unicode.org/reports/tr35/#Unicode_Sets 

If that is what is intended, then:

- UAX #31 should give an introduction to the notation and reference to the 
	specification for it at or before the first usage of the notation.
- UAX #31 should use consistent terminology for how it refers to the notation; 
	if an informal expression is preferred, then that should be introduced 
	when the notation is first introduced. 

(E.g., "At several points in this document, character classes will be described using 
UnicodeSet notation (hereafter, "set notation"). This notation is defined in [UnicodeSets].")

Date/Time: Tue May 12 20:46:39 CDT 2020
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: IdentifierType of Ainu Katakana characters

In IdentifierStatus.txt:

31F0..31FF    ; Technical                      # 3.2   [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO

These are from the Katakana Phonetic Extensions block; which exists 
for writing the Ainu language. Ainu is apparently both written using 
the Latin and Katakana scripts, using these extensions.

According to UTS 39 Table 1[1], "Technical" is "Specialized usage: technical, 
liturgical, etc.", which doesn't seem to fit with code points that are 
actively used in a primary script for a language.

Should we be changing this to Recommended?

 [1]: https://www.unicode.org/reports/tr39/#Identifier_Status_and_Type

Date/Time: Tue Aug 4 17:50:07 CDT 2020
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: IdentifierType of Balinese musical symbols

In IdentifierType.txt:

1B6B..1B73    ; Limited_Use                    # 5.0    [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG

These should probably be "Limited_Use Technical", not just Limited_Use

Date/Time: Fri Aug 14 16:04:06 CDT 2020
Name: Markus W Scherer
Report Type: Error Report
Opt Subject: UTS #46 should validate ACE label edge cases

The IDNA2008 ToUnicode operation validates ACE labels ("xn--" plus Punycode)
by decoding them, then re-encoding via ToASCII, and verifying that the
round-trip output is the same as the input (case-insensitive).

The UTS #46 ToUnicode operation and its Processing step uses a cheaper
Convert/Validate step which wants to be equivalent.

However, it misses two edge cases which pass Convert/Validate step but which
IDNA2008 catches with its round-trip verification:

1. "xn--" decodes to an empty string
2. "xn--ASCII-" decodes to just "ASCII"

I propose that we modify
https://www.unicode.org/reports/tr46/#ProcessingStepPunycode (section 4
Processing > step 4 "Convert/Validate" > If the label starts with
“xn--”) so that it catches these cases.

Note that it is possible to check for these cases before/without
Punycode-decoding the label, except that, for equivalent error handling,
"xn---" should be skipped, letting Punycode decode fail instead. (In
IDNA2008 ToUnicode, a Punycode decode error preempts the round-trip
verification, and a quirk in the decoding procedure lets the "last
delimiter" slip into the main decoding loop if that delimiter immediately
follows the ACE prefix. The loop fails because the hyphen is not a valid
Punycode digit.)

Date/Time: Sun Aug 16 18:43:48 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing Indic shaping properties for Common script Vedic characters

The Vedic signs 1CE9..1CEC and 1CEE..1CF1 are missing Indic syllabic category 
definitions in the Unicode 13.0 data. At least some of these characters are 
attested in L2/07-343, figures 8H–8J, as carrying marks, so the default category 
Other is incorrect for them. For others, the default category Other might be 
correct, but if that’s the case, I think it would be preferable to explicitly 
provide the value.

Feedback routed to Emoji SC for evaluation

Date/Time: Tue Sep 22 17:26:17 CDT 2020
Contact: wjgo_10009@btinternet.com
Name: William Overington
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0

Note: This feedback has been forwarded to ESC for response; no further action is required from the UTC.

L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0
 
I refer to the following document.
 
https://www.unicode.org/L2/L2020/20213-palms-up-down-emoji.pdf 
 
Hand with palm facing up and Hand with palm facing down for Unicode 14.0
 
The meaning of the proposed emoji 'Hand with palm facing down' is fine.
 
My own experience is that this is quite formal.
 
For example, as in the following.
 
“Good evening ma’am, may I have the pleasure of this dance?” and offers his
right hand, palm down, as if in a formal ballroom setting.
 
The meaning of the proposed emoji 'Hand with palm facing up' as "drop, go
away, drop it, put down" does not correspond with my own personal
experience, though the concept mentioned later in the document of "Palm up
can indicate a lack of knowledge" does in the sense of "Who knows!", though
I do not understand quite what "Palm up can indicate a lack of knowledge
cross-linguistically" means. But my lack of experience of the meanings
stated in the document is no reason whatsoever not to encode the proposed
meanings for that hand gesture.
 
However, I am thinking that the proposed 'Hand with palm facing up' could be
renamed as 'Hand with palm facing up with fingers upward' and a third emoji
'Hand with palm facing up with fingers downward' added.
 
For me, 'Hand with palm facing up with fingers downward' is a common
gesture, such as inviting a visitor to home or office (in antepandemicum
times and hopefully in the future) to sit down and make himself or herself
comfortable, or to indicate "after you" when two lanes of road traffic are
merging at road works, or "please proceed" when letting a car from a side
road into queued road traffic. The custom being that the other driver raises
his or her hand in acknowledgement and thanks.
 
For another example, going into a restaurant (in antepandemicum times and
hopefully in the future) early evening when all of the trade at that time of
day appears to be take-aways, and asking if one can have a sit down meal at
present (as one is going direct from work to an evening institute meeting)
and the manager indicating 'yes certainly' in speech and by a palm up
fingers downward gesture towards the empty seated area of the restaurant.
 
So could there be three new emoji for these hand gestures rather than just
the two in the proposal please?

William Overington
 
Tuesday 22 September 2020

Feedback routed to Editorial Committee for evaluation

Date/Time: Tue Jul 21 13:06:27 CDT 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: U+2E4E PUNCTUS ELEVATUS MARK

The note for U+2E4E PUNCTUS ELEVATUS MARK says “indicates a major medial pause 
where the sense is complete but the meaning is not”. How is that possible? 
Aren’t sense and meaning the same thing?

Date/Time: Wed Jul 29 22:46:20 CDT 2020
Name: Ajith
Report Type: Error Report
Opt Subject: U+0BA9 wrongly mentioned as malayalam letter nnna

Madam / Sir,

The Unicode® Standard Version 13.0 – Core Specification, Chapter 12, Page
509 says "The letter nnna is parallel to U+0BA9 malayalam letter nnna."
(under the section Historic and Scholarly Characters of Malayalam). 

The U+0BA9 is Tamil character. Hope it will be corrected.

Thanks,
ajith

Date/Time: Wed Jul 29 23:33:52 CDT 2020
Name: Ajith
Report Type: Error Report
Opt Subject: possible errors in Table 12-37. Candrakkala Examples

Madam / Sir,

Table 12-37. Candrakkala Examples given under Rendering Malayalam >
Candrakkala, of The Unicode® Standard Version 13.0 – Core Specification,
section 12.9 shows three examples, of which two are unheard of and against
common usage.

എ്ന്നാ on which day? 0D0E, 0D4D, 0D28, 0D4D, 0D28, 0D3E
ഐശീല്ം than ice 0D10, 0D36, 0D40, 0D32, 0D4D, 0D02

The chandrakala doesn't follow a vowel except after vowel sign u to show
samvrothokaram. The എ്ന്നാ is wrong on this ground. The word used to say 'on
which day?' is എന്നാ (without the chandrakala).

The word ഐശീല്ം as well as the meaning ascribed to it makes no sense.

Hope these examples will be removed and commonly used words substituted with
the correct spelling. At the very least, if these examples are retained, a
reference to their authenticity should be provided.

Thanks,
ajith

Date/Time: Thu Jul 30 01:19:40 CDT 2020
Name: Ajith
Report Type: Error Report
Opt Subject: wrong unicode charcters inTable 12-41. Malayalam /ṉṟa/ and /ṉṯa/

Madam / Sir,

Table 12-41. Malayalam /ṉṟa/ and /ṉṯa/ given under Rendering Malayalam >
Special Cases Involving rra, of The Unicode® Standard Version 13.0 – Core
Specification, section 12.9 shows three examples, all of which are shown
with the wrong unicode characters

ആൻേറാ 0D06 0D7B 0D47 0D31 0D3E a proper name
ആൻ്റോ 0D06 0D7B 0D4D 0D31 0D47 0D3E
എൻറോൾ 0D0E 0D7B 0D31 0D47 0D3E 0D7E

The error in all three are that the vowel symbols േ 0D47  and ാ 0D3E are
used instead of ോ 0D4B. 

Thanks,
ajith

Date/Time: Thu Jul 30 15:56:14 CDT 2020
Name: Peter Constable
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on UTS #39

This feedback pertains to Revision 22 of UTS39: http://www.unicode.org/reports/tr39/tr39-22.html

Section 3 begins discussing identifiers without really introducing the
connection to the topic of UTS39. The opening sentence currently is:

"Identifiers are special-purpose strings used for identification—strings
that are deliberately limited to particular repertoires for that purpose.
Exclusion of characters from identifiers does not affect the general use
of those characters, such as within documents. ..."

This doesn't introduce the connection between security and identifiers.
Also, it seems to assume that limitation of character repertoires is a
defining characteristic of identifiers, which is not the case. (Just as a
hash can be used as a resource ID without a restriction on the bytes in the
hash, so also an application _could_ use character sequences without
repertoire restriction as IDs.)

Suggested revision:

"Identifiers ("IDs") are strings used in particular application contexts
to refer to entities of certain significance in the given application. In a
given application, an identifier will map to at most one specific entity.
Many applications have security requirements related to identifiers. One
common example is user IDs, used to restrict access to certain data or
resources as appropriate. Another common example is URLs referring to pages
or other resources on the Internet: when a user wishes to access a
resource, it is important that the user can be certain what resource they
are interacting with—for instance, that they are interacting with a
particular financial service and not some other entity that is spoofing the
intended service for malicious purposes. The latter is an example of a
general security concern for identifiers: potential ambiguity of strings.
While a machine has no difficulty distinguishing between any two different
character sequences, it could be very difficult or impossible for humans to
recognize and distinguish identifiers if an application permitted any
Unicode characters to be in identifiers. Mitigation of this issue is the
focus of this specification.

"Restriction of the character repertoire that can be used in identifiers
is an important security technique. Most applications will deliberately
limit characters that can be used in identifiers for that purpose. (Note
that exclusion of characters from identifiers does not affect the general
use of those characters for other purposes, such as within documents.) ..."

Date/Time: Thu Jul 30 16:27:43 CDT 2020
Name: Peter Constable
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on UTS #39

This feedback pertains to revision 22 of UTS #39:
http://www.unicode.org/reports/tr39/tr39-22.html

Some wording in section 3.1 is unclear and could be improved.

1)
"The principle has been to be more conservative initially,..."

(The principle for what?) Suggested revision:

"A principle used in determining which characters to be Restricted
has been to be more conservative initially,..."

2)
"There may be multiple reasons for restricting a character. For clarity,
Identifier_Type values of Not_Character, Deprecated, Default_Ignorable, and
Not_NFKC cause values below them in the Restricted rows to be suppressed:
For example..."

First, something significant that isn't mentioned is that Identifier_Type is
a multi-valued property. (Btw, I don't see multi-valued properties discussed
in UTR23.) That is, the Identifier_Type property value is a non-empty subset
of values from the set {Not_Character, Deprecated, ...}. (The other way to
formalize would be to say that Identifier_Type is a collection of binary
properties, Not_Character, Deprecated, etc.) This should be called out.

Secondly, the wording in the second sentence is a bit unclear as to what it's referring to.

Suggested revision:

"There may be multiple reasons for restricting a character. For this reason,
the Identifier_Type property allows multiple values that correspond with
Restricted. For example, some characters have Identifier_Type values of
Limited_Use and Technical. In the case of characters that have
Identifier_Type values of Not_Character, Deprecated, Default_Ignorable, or
Not_NFKC, other Identifier_Type property values listed below that value in
Table 1 are not also assigned as additional property values. For
example..."

3)
"Restricted characters should be treated with caution in registration..."

What is "registration" referring to?

Perhaps:
"Restricted characters should be treated with caution when considering
possible use in identifiers..."

Date/Time: Thu Jul 30 16:47:52 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31

This feedback pertains to Revision 33 of UAX#31:
http://www.unicode.org/reports/tr31/tr31-33.html 

In section 1, the fourth paragraph begins, 
"This annex also provides guidelines for the user of normalization 
and case insensitivity with identifiers..."

This wording with "user" is strange. (Who is a _user_ of 
normalization with identifiers?) Suggested revision: change "user" to "use".

Date/Time: Thu Jul 30 17:11:37 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31

This feedback pertains to revision 33 of UAX#31:
http://www.unicode.org/reports/tr31/tr31-33.html 

There appears to be a wording issue in the first paragraph of section 1.3:

"For example, an implementation might display format what the user 
has entered, but use a normalized format for comparison."

The problem wording is "might display format": it appears like a verb phrase
("display-format" being the main verb), but that doesn't really make sense.
I gather what was meant is:

"For example, an implementation might display what the user has 
entered, but use a normalized format for comparison."

Suggested change: delete the first instance of "format".

Date/Time: Mon Aug 24 02:26:55 CDT 2020
Name: David E Starner
Report Type: Error Report
Opt Subject: Table 22-4. Compatibility Digits is incomplete

Table 22-4. Compatibility Digits on page 831 of the Standard version 13.0.0 
is incomplete. It's missing 1FBF0-1FBF9, Segmented Digits, that have font 
decomposition to the normal ASCII digits.

Date/Time: Mon Aug 24 20:23:52 CDT 2020
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Latest UAX #14 not cleaned up properly

Latest version of UAX #14 at https://unicode.org/reports/tr14/, under 
Regional Indicator, it says "beginnning" with a yellow background 
overstruck "n", which means it wasn't cleaned up properly.

Date/Time: Sat Aug 29 16:59:13 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: No real definition of Devanagari cluster pattern

Section 12.1 Devanagari of The Unicode Standard contains various bits of
information about the structure of Devanagari clusters, but doesn’t provide
a complete pattern for them. There should be one or several regular
expressions covering all possible cluster patterns. See
https://lindenbergsoftware.com/en/notes/issues-in-devanagari-cluster-validation/ 
for a detailed discussion of issues and recommendations.

Other Reports

(None at this time.)

422	Proposed Update UAX #9, Unicode Bidirectional Algorithm	(feedback) No feedback at this time
421	Proposed Update UAX #38, Unicode Han Database (Unihan)	(feedback)
420	Proposed Update UAX #45, U-source Ideographs	(feedback) No feedback at this time
419	Proposed Update UAX #44, Unicode Character Database	(feedback) No feedback at this time
417	Proposed Update UAX #29, Unicode Text Segmentation	(feedback) No feedback at this time
416	Proposed Update UAX #14, Unicode Line Breaking Algorithm	(feedback) No feedback at this time
415	Proposed Update UTR #23, The Unicode Character Property Model	(feedback) No feedback at this time
408	QID Emoji	(feedback) Last feedback June 4, 2020

L2/20-239