The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of September 23, 2020, since the previous cumulative document was issued prior to UTC #164 (July 2020).
The links below go directly to open PRIs and to feedback documents for them, as of September 23, 2020.
422 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback) No feedback at this time 421 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback) 420 Proposed Update UAX #45, U-source Ideographs (feedback) No feedback at this time 419 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time 417 Proposed Update UAX #29, Unicode Text Segmentation (feedback) No feedback at this time 416 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time 415 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time 408 QID Emoji (feedback) Last feedback June 4, 2020
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to ucd-dev ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
Date/Time: Fri Jul 17 20:18:55 CDT 2020 (updated 2020-09-01)
Name: Jim Breen
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Additional Unihan information for U+FA11 U+37E2 U+2550E
I would like to propose some additional Unihan information for several related characters. I am basing this submission on the entries in the 2002 edition of Shibano's JIS 漢字字典. That dictionary covers the kanji in JIS X 0208 and JIS X 0213. As you are probably aware, Shibano chaired the JSC committee that revised JIS X 0208 and developed JIS X 0213. The characters are: U+FA11 﨑 (p. 174) U+37E2 㟢 (p. 173) U+2550E 𥔎 (p. 457) The kIRG_JSource for all three is JIS X 0213. These characters are stated in Shibano to be variants of 崎 (U+5D0E), 碕 (U+7895), 嵜 (U+5D5C) and 埼 (U+57FC). U+5D0E is a Jinmeiyo Kanji (2010). The additions I propose are: U+FA11 﨑 kJapaneseKun SAKI kJapaneseOn KI kDefinition cape; spit; promontory U+37E2 㟢 kJapaneseKun SAKI kDefinition cape; spit; promontory U+2550E 𥔎 kJapaneseKun SAKI kDefinition cape; spit; promontory The readings are all drawn from Shibano. The kDefinition values are from those associated with the related characters in Japanese sources. ------ This is in addition to the following changes proposed by Ken and others: U+37E2 kSemanticVariant U+57FC U+5D0E<kMorohashi:TZ U+5D5C U+7895 U+966D U+FA11 U+2550E U+57FC kJapaneseKun SAI SAKI U+57FC kSemanticVariant U+37E2 U+5D0E<kMorohashi U+5D5C U+7895<kMorohashi:T U+966D U+FA11 U+2550E U+5D0E kSemanticVariant U+37E2<kMorohashi:T U+57FC<kMorohashi U+5D5C<kMorohashi U+7895<kMorohashi U+966D<kMorohashi U+FA11<kMorohashi U+2550E U+5D5C kSemanticVariant U+37E2 U+57FC U+5D0E<kMorohashi:Z U+7895 U+966D U+FA11 U+2550E U+7895 kSemanticVariant U+37E2 U+57FC<kMorohashi:TZ U+5D0E<kMorohashi U+5D5C U+966D U+FA11 U+2550E U+966D kSemanticVariant U+37E2 U+57FC U+5D0E<kMorohashi U+5D5C U+7895 U+FA11 U+2550E U+FA11 kSemanticVariant U+37E2 U+57FC U+5D0E<kMorohashi:Z U+5D5C U+7895 U+966D U+2550E U+2550E kSemanticVariant U+37E2 U+57FC U+5D0E U+5D5C U+7895 U+966D U+FA11
Date/Time: Wed Aug 5 11:37:47 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: Errors in the Unihan Database
(1) U+4CA4 kTotalStrokes 21 ↓ U+4CA4 kTotalStrokes 18 (2) U+9FD2 kSimplifiedVariant U+9FD3 U+9FD3 kTraditionalVariant U+9FD2 ↓ U+9FD2 kTraditionalVariant U+9FD3 U+9FD3 kSimplifiedVariant U+9FD2
Date/Time: Mon Aug 31 08:29:11 CDT 2020
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Unihan-related feedback
Please consider the following three pieces of Unihan-related feedback: 1) Change 釒 (U+91D2) to 金 (U+91D1) in the IDSes for the following eight U-Source ideographs: UTC-00102;C;U+2B4B6;167.9;1316.111;⿰釒凾;kMatthews 2051; UTC-00207;X;;167.10;1318.281;⿰釒冤;kSBGY 115.19; UTC-00432;X;;167.11;1321.071;⿰釒患;kMeyerWempe 3708b; UTC-00872;D;U+2B7F0;167.6;1305.211;⿰釒当;Adobe-Japan1 20240; UTC-00889;N;;167.10;1318.281;⿰釒袓;Adobe-CNS1 C+16257; UK-02711;G;U+30F25;167.5;1303.101;⿰釒卢;UTCDoc L2/15-260 1399; UK-02829;UK-2015;UTC-02828;167.7;1308.261;⿰釒囱;UTCDoc L2/15-260 1517; UK-02895;G;U+30F23;167.4;1299.191;⿰釒㝉;UTCDoc L2/15-260 1583; Rationale: 釒 (U+91D2) appears only once in the IDS database, as itself. 金 (U+91D1) is used as a component in over 2,000 ideographs. Also, the IDS database already includes these adjustments for those that are encoded. 2) Simplify the IDS for UTC-00892 (U+2DF3C 𭼼) as follows: Current: UTC-00892;F;U+2DF3C;104.23;0783.271;⿸疒⿲彳⿳山一黑攵;Adobe-CNS1 C+16303; Proposed: UTC-00892;F;U+2DF3C;104.23;0783.271;⿸疒黴;Adobe-CNS1 C+16303; Rationale: The IDS database already specifies ⿸疒黴 as the IDS for U+2DF3C 𭼼 (UTC-00892). 3) Horizontally-extend U+289B1 𨦱 (Extension B) to add UK-02829 ⿰金囱 as a source reference. Its simplified form, U+30F8A 𰾊 (UK-2828), is in Extension G, which further means that the kSimplifiedVariant and kTraditionalVariant properties can be added to these ideographs as follows: U+289B1 kSimplifiedVariant U+30F8A U+30F8A kTraditionalVariant U+289B1 That is all.
Date/Time: Thu Sep 3 12:30:38 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: U+28E0F kTotalStrokes error
The kTotalStrokes value for U+28E0F 𨸏 should be 8, not 2.
Date/Time: Thu Sep 3 12:34:42 CDT 2020
Name: Jaemin Chung
Report Type: Other Question, Problem, or Feedback
Opt Subject: Request for addition of one cross-reference under U+2EA7
Under U+2EA7 ⺧, a cross-reference to U+20092 𠂒 needs to be added. When I was writing L2/19-214R, I was not aware of U+20092.
Date/Time: Thu Sep 17 18:20:16 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: kTotalStrokes value for U+2B413
The kTotalStrokes value for U+2B413 𫐓 should be 13, not 10.
Date/Time: Thu Sep 17 18:44:18 CDT 2020
Name: Jaemin Chung
Report Type: Error Report
Opt Subject: kCantonese value for U+2B413
http://unicode.org/L2/L2020/20231-2B413-2B5E6-change.pdf In addition to what I wrote in L2/20-231, the kCantonese value for U+2B413 𫐓 should be changed to jau4 (which is the kCantonese value for U+8F2E 輮). This has to be changed anyway, and I think copying the value from the traditional counterpart is fine in this case.
Date/Time: Sun Aug 16 06:02:29 CDT 2020
Name: Moemen Metwally
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Eastern Arabic Fractions
Good Morning, I've looked thoroughly at the Arabic unicode ranges, and despite what seems to be an obsession with Islamic religious rhetoric & a very specific Qur'anic orthography, there are some serious basic oversights. I hope I'm mistaken and you can lead me to the range where I find them! The Eastern half of the Arab world (the Mashreq) uses the following numerals ۱۲۳٤٥٦٧۸۹۰ - and although unicode includes the symbols for cube-root and fourth-root, as well as certain mathematical symbols like the one for diameter, I cannot find: - Symbols for half, a third, a quarter, three-quarters... the vulgar fractions we commonly see in handwriting, print, manuscripts, etc. They are widely used. - A unicode symbol for the 'egyptian' two. KFGQPC Uthman Taha is the only font which finds a workaround for this, compare it to any other arab font and you'll see how the two has no 'tooth', as it is written in Egypt, whereas the current unicode standard uses the tooth. There's a bit more to say but let's start with those please!
Date/Time: Thu Sep 3 16:18:05 CDT 2020
Name: Eduardo Marín Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Suggestion to supplement L2/20-209 with named character sequences
The proposal to add the characters needed for the kana Hokkien/Minnan orthography is in my opinion well formed and should not be considered "preliminary". That being said, I would suggest also proposing a set of named character sequences for the letters with combining marks, if and only if they have formal names. This is already the case for other kana letters with combining marks, and so it would make it all consistent.
Date/Time: Sat Sep 5 10:34:31 CDT 2020
Name: Ken Lunde
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on feedback on L2/20-209
With regard to Eduardo Marín Silva's 2020-09-03 feedback on L2/20-209, I disagree with the proposal to add named character sequences. They are not necessary. The existing named character sequences for kana exist only because those combining sequences correspond to atomic characters in the JIS X 0213 standard, which is explicitly mentioned in NamedSequences.txt. Named character sequences for additional kana were once proposed in L2/16-133, but the UTC rejected them during UTC #147 for this reason: https://www.unicode.org/L2/L2016/16133-japanese-voiced-vowels.pdf The first paragraph of Section 1.1 of UAX #34, Unicode Named Character Sequences, captures this nicely: In some limited circumstances it is necessary to also provide a name for such sequences. The primary example is the need to have an identifier for a sequence to correlate with an identifier in another standard, for which a cross-mapping to Unicode is desired. To address this need, the Unicode Standard defines a mechanism for naming sequences and provides a short list of sequences that have been formally named. This list is deliberately selective: it is neither possible nor desirable to attempt to provide names for all possible sequences of Unicode characters that could be of interest. Regards... -- Ken
Date/Time: Thu Jul 30 16:55:11 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31
This feedback pertains to revision 33 of UAX#31: http://www.unicode.org/reports/tr31/tr31-33.html In section 1, the paragraph after Figure 1 says, "The set consisting of the union of ID_Start and ID_Nonstart characters is known as Identifier Characters ..." Then in section 1.1, the second bulleted item in the list of stability guarantees says, "The Identifier characters are always a superset of the ID_Start characters." Given the definition of "Identifier Characters" given in section 1, this statement is tautological—necessarily true, by definition—so not useful to state as a stability guarantee. Was "proper superset" meant?
Date/Time: Thu Jul 30 17:23:29 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31
This feedback pertains to revision 33 of UAX#31 In section 2, in the 4th paragraph, the last sentence says, "The second column provides a general description of the coverage for the associated class, the derivational relationship between the ID properties and the XID properties, and an associated set notation for the class." The concepts "ID property" and "XID property" are in this way introduced. If there were mention of only "ID property", that would be fine: in the context, it would be sufficiently clear that there will be character properties pertaining to IDs that are used for Default Identifier Syntax. However, with a second concept thrown in, "XID property", this becomes confusing. (Huh? What's an "XID property" and what does it have to do with identifier syntax?) It would help to introduce the pair of terms with some explanation of what "XID" is all about.
Date/Time: Thu Jul 30 18:17:04 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31, 2.3.1 Limitations
This feedback pertains to revision 33 of UAX#31: http://www.unicode.org/reports/tr31/tr31-33.html Section 2.3.1 discusses potential tightening of restrictions in regard to A1, A2 or B (use of ZWNJ or ZWJ within IDs in certain contexts). The last paragraph says the following: "Comparison. Typically the identifiers with and without these characters should compare as equivalent, to prevent security issues." Examples given in the preceding descriptions of A1, A2 and B included cases in which strings with or without the joiner were both linguistically valid; e.g., Farsi words for "names" and "a letter". But a constraint on comparison is, in effect, preventing a distinction from being made: strings with and without the joiner are to be treated as the same ID. That seems to amount to saying that the joiners should only be kept when displaying IDs as typed by a user. In that case, it seems like this paragraph in 2.3.1 should suggest that. In addition, it seems like it would make sense for 2.3.1 to mention layout and format control characters, when permitted in IDs, as a potential basis for distinguishing between display format and comparison format.
Date/Time: Thu Jul 30 18:43:45 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31, set notation
This feedback pertains to revision 33 of UAX#31: In section 2, Table 2 includes descriptions of property values in terms of "set notation". This is introduced in the immediately-preceding paragraph: "The second column provides ... an associated set notation for the class." An example, the notation used for describing ID_Start: "[\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]" No explanation is provided for this notation. It might make sense to someone already familiar with Unicode and the notation from other contexts. For someone coming from, say, a mathematics background but without Unicode experience, this does not like any familar set notation. (Math convention is to use brace brackets to denote a set; that's also used in, e.g., Python.) There are many classes of readers that would get to this point in the doc and wonder where the notation is explained. The doc continues with other use of the notation, without explanation. E.g., section 2.3, under A.1: "This corresponds to the following regular expression (in Perl-style syntax): /$LJ $T* ZWNJ $T* $RJ/ where: $T = \p{Joining_Type=Transparent} $RJ = [\p{Joining_Type=Dual_Joining}\p{Joining_Type=Right_Joining}] $LJ = [\p{Joining_Type=Dual_Joining}\p{Joining_Type=Left_Joining}]" The first hint—if the reader recognizes it as such, is a mention in section 2.4, after Table 3b, of "UnicodeSet syntax". "In UnicodeSet syntax, the characters in these tables are: Table 3: [\$_] Table 3a: ['\-.\:·֊״་‐’‧゠・] Table 3b: [\u200D ׳]" This appears to be the same notation, but referred to in a different way: "UnicodeSet syntax" (versus "set notation" earlier—same notation? Or different?). This appears to be using the "UnicodeSet notation" specified in section 5.3.3 of UTS#35 http://unicode.org/reports/tr35/#Unicode_Sets If that is what is intended, then: - UAX #31 should give an introduction to the notation and reference to the specification for it at or before the first usage of the notation. - UAX #31 should use consistent terminology for how it refers to the notation; if an informal expression is preferred, then that should be introduced when the notation is first introduced. (E.g., "At several points in this document, character classes will be described using UnicodeSet notation (hereafter, "set notation"). This notation is defined in [UnicodeSets].")
Date/Time: Tue May 12 20:46:39 CDT 2020
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: IdentifierType of Ainu Katakana characters
In IdentifierStatus.txt: 31F0..31FF ; Technical # 3.2 [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO These are from the Katakana Phonetic Extensions block; which exists for writing the Ainu language. Ainu is apparently both written using the Latin and Katakana scripts, using these extensions. According to UTS 39 Table 1[1], "Technical" is "Specialized usage: technical, liturgical, etc.", which doesn't seem to fit with code points that are actively used in a primary script for a language. Should we be changing this to Recommended? [1]: https://www.unicode.org/reports/tr39/#Identifier_Status_and_Type
Date/Time: Tue Aug 4 17:50:07 CDT 2020
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: IdentifierType of Balinese musical symbols
In IdentifierType.txt: 1B6B..1B73 ; Limited_Use # 5.0 [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG These should probably be "Limited_Use Technical", not just Limited_Use
Date/Time: Fri Aug 14 16:04:06 CDT 2020
Name: Markus W Scherer
Report Type: Error Report
Opt Subject: UTS #46 should validate ACE label edge cases
The IDNA2008 ToUnicode operation validates ACE labels ("xn--" plus Punycode) by decoding them, then re-encoding via ToASCII, and verifying that the round-trip output is the same as the input (case-insensitive). The UTS #46 ToUnicode operation and its Processing step uses a cheaper Convert/Validate step which wants to be equivalent. However, it misses two edge cases which pass Convert/Validate step but which IDNA2008 catches with its round-trip verification: 1. "xn--" decodes to an empty string 2. "xn--ASCII-" decodes to just "ASCII" I propose that we modify https://www.unicode.org/reports/tr46/#ProcessingStepPunycode (section 4 Processing > step 4 "Convert/Validate" > If the label starts with “xn--”) so that it catches these cases. Note that it is possible to check for these cases before/without Punycode-decoding the label, except that, for equivalent error handling, "xn---" should be skipped, letting Punycode decode fail instead. (In IDNA2008 ToUnicode, a Punycode decode error preempts the round-trip verification, and a quirk in the decoding procedure lets the "last delimiter" slip into the main decoding loop if that delimiter immediately follows the ACE prefix. The loop fails because the hyphen is not a valid Punycode digit.)
Date/Time: Sun Aug 16 18:43:48 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing Indic shaping properties for Common script Vedic characters
The Vedic signs 1CE9..1CEC and 1CEE..1CF1 are missing Indic syllabic category definitions in the Unicode 13.0 data. At least some of these characters are attested in L2/07-343, figures 8H–8J, as carrying marks, so the default category Other is incorrect for them. For others, the default category Other might be correct, but if that’s the case, I think it would be preferable to explicitly provide the value.
Date/Time: Tue Sep 22 17:26:17 CDT 2020
Contact: wjgo_10009@btinternet.com
Name: William Overington
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0
Note: This feedback has been forwarded to ESC for response; no further action is required from the UTC.
L2/20-213 Hand with palm facing up and Hand with palm facing down for Unicode 14.0 I refer to the following document. https://www.unicode.org/L2/L2020/20213-palms-up-down-emoji.pdf Hand with palm facing up and Hand with palm facing down for Unicode 14.0 The meaning of the proposed emoji 'Hand with palm facing down' is fine. My own experience is that this is quite formal. For example, as in the following. “Good evening ma’am, may I have the pleasure of this dance?” and offers his right hand, palm down, as if in a formal ballroom setting. The meaning of the proposed emoji 'Hand with palm facing up' as "drop, go away, drop it, put down" does not correspond with my own personal experience, though the concept mentioned later in the document of "Palm up can indicate a lack of knowledge" does in the sense of "Who knows!", though I do not understand quite what "Palm up can indicate a lack of knowledge cross-linguistically" means. But my lack of experience of the meanings stated in the document is no reason whatsoever not to encode the proposed meanings for that hand gesture. However, I am thinking that the proposed 'Hand with palm facing up' could be renamed as 'Hand with palm facing up with fingers upward' and a third emoji 'Hand with palm facing up with fingers downward' added. For me, 'Hand with palm facing up with fingers downward' is a common gesture, such as inviting a visitor to home or office (in antepandemicum times and hopefully in the future) to sit down and make himself or herself comfortable, or to indicate "after you" when two lanes of road traffic are merging at road works, or "please proceed" when letting a car from a side road into queued road traffic. The custom being that the other driver raises his or her hand in acknowledgement and thanks. For another example, going into a restaurant (in antepandemicum times and hopefully in the future) early evening when all of the trade at that time of day appears to be take-aways, and asking if one can have a sit down meal at present (as one is going direct from work to an evening institute meeting) and the manager indicating 'yes certainly' in speech and by a palm up fingers downward gesture towards the empty seated area of the restaurant. So could there be three new emoji for these hand gestures rather than just the two in the proposal please? William Overington Tuesday 22 September 2020
Date/Time: Tue Jul 21 13:06:27 CDT 2020
Name: David Corbett
Report Type: Other Question, Problem, or Feedback
Opt Subject: U+2E4E PUNCTUS ELEVATUS MARK
The note for U+2E4E PUNCTUS ELEVATUS MARK says “indicates a major medial pause where the sense is complete but the meaning is not”. How is that possible? Aren’t sense and meaning the same thing?
Date/Time: Wed Jul 29 22:46:20 CDT 2020
Name: Ajith
Report Type: Error Report
Opt Subject: U+0BA9 wrongly mentioned as malayalam letter nnna
Madam / Sir, The Unicode® Standard Version 13.0 – Core Specification, Chapter 12, Page 509 says "The letter nnna is parallel to U+0BA9 malayalam letter nnna." (under the section Historic and Scholarly Characters of Malayalam). The U+0BA9 is Tamil character. Hope it will be corrected. Thanks, ajith
Date/Time: Wed Jul 29 23:33:52 CDT 2020
Name: Ajith
Report Type: Error Report
Opt Subject: possible errors in Table 12-37. Candrakkala Examples
Madam / Sir, Table 12-37. Candrakkala Examples given under Rendering Malayalam > Candrakkala, of The Unicode® Standard Version 13.0 – Core Specification, section 12.9 shows three examples, of which two are unheard of and against common usage. എ്ന്നാ on which day? 0D0E, 0D4D, 0D28, 0D4D, 0D28, 0D3E ഐശീല്ം than ice 0D10, 0D36, 0D40, 0D32, 0D4D, 0D02 The chandrakala doesn't follow a vowel except after vowel sign u to show samvrothokaram. The എ്ന്നാ is wrong on this ground. The word used to say 'on which day?' is എന്നാ (without the chandrakala). The word ഐശീല്ം as well as the meaning ascribed to it makes no sense. Hope these examples will be removed and commonly used words substituted with the correct spelling. At the very least, if these examples are retained, a reference to their authenticity should be provided. Thanks, ajith
Date/Time: Thu Jul 30 01:19:40 CDT 2020
Name: Ajith
Report Type: Error Report
Opt Subject: wrong unicode charcters inTable 12-41. Malayalam /ṉṟa/ and /ṉṯa/
Madam / Sir, Table 12-41. Malayalam /ṉṟa/ and /ṉṯa/ given under Rendering Malayalam > Special Cases Involving rra, of The Unicode® Standard Version 13.0 – Core Specification, section 12.9 shows three examples, all of which are shown with the wrong unicode characters ആൻേറാ 0D06 0D7B 0D47 0D31 0D3E a proper name ആൻ്റോ 0D06 0D7B 0D4D 0D31 0D47 0D3E എൻറോൾ 0D0E 0D7B 0D31 0D47 0D3E 0D7E The error in all three are that the vowel symbols േ 0D47 and ാ 0D3E are used instead of ോ 0D4B. Thanks, ajith
Date/Time: Thu Jul 30 15:56:14 CDT 2020
Name: Peter Constable
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on UTS #39
This feedback pertains to Revision 22 of UTS39: http://www.unicode.org/reports/tr39/tr39-22.html Section 3 begins discussing identifiers without really introducing the connection to the topic of UTS39. The opening sentence currently is: "Identifiers are special-purpose strings used for identification—strings that are deliberately limited to particular repertoires for that purpose. Exclusion of characters from identifiers does not affect the general use of those characters, such as within documents. ..." This doesn't introduce the connection between security and identifiers. Also, it seems to assume that limitation of character repertoires is a defining characteristic of identifiers, which is not the case. (Just as a hash can be used as a resource ID without a restriction on the bytes in the hash, so also an application _could_ use character sequences without repertoire restriction as IDs.) Suggested revision: "Identifiers ("IDs") are strings used in particular application contexts to refer to entities of certain significance in the given application. In a given application, an identifier will map to at most one specific entity. Many applications have security requirements related to identifiers. One common example is user IDs, used to restrict access to certain data or resources as appropriate. Another common example is URLs referring to pages or other resources on the Internet: when a user wishes to access a resource, it is important that the user can be certain what resource they are interacting with—for instance, that they are interacting with a particular financial service and not some other entity that is spoofing the intended service for malicious purposes. The latter is an example of a general security concern for identifiers: potential ambiguity of strings. While a machine has no difficulty distinguishing between any two different character sequences, it could be very difficult or impossible for humans to recognize and distinguish identifiers if an application permitted any Unicode characters to be in identifiers. Mitigation of this issue is the focus of this specification. "Restriction of the character repertoire that can be used in identifiers is an important security technique. Most applications will deliberately limit characters that can be used in identifiers for that purpose. (Note that exclusion of characters from identifiers does not affect the general use of those characters for other purposes, such as within documents.) ..."
Date/Time: Thu Jul 30 16:27:43 CDT 2020
Name: Peter Constable
Report Type: Other Question, Problem, or Feedback
Opt Subject: Feedback on UTS #39
This feedback pertains to revision 22 of UTS #39: http://www.unicode.org/reports/tr39/tr39-22.html Some wording in section 3.1 is unclear and could be improved. 1) "The principle has been to be more conservative initially,..." (The principle for what?) Suggested revision: "A principle used in determining which characters to be Restricted has been to be more conservative initially,..." 2) "There may be multiple reasons for restricting a character. For clarity, Identifier_Type values of Not_Character, Deprecated, Default_Ignorable, and Not_NFKC cause values below them in the Restricted rows to be suppressed: For example..." First, something significant that isn't mentioned is that Identifier_Type is a multi-valued property. (Btw, I don't see multi-valued properties discussed in UTR23.) That is, the Identifier_Type property value is a non-empty subset of values from the set {Not_Character, Deprecated, ...}. (The other way to formalize would be to say that Identifier_Type is a collection of binary properties, Not_Character, Deprecated, etc.) This should be called out. Secondly, the wording in the second sentence is a bit unclear as to what it's referring to. Suggested revision: "There may be multiple reasons for restricting a character. For this reason, the Identifier_Type property allows multiple values that correspond with Restricted. For example, some characters have Identifier_Type values of Limited_Use and Technical. In the case of characters that have Identifier_Type values of Not_Character, Deprecated, Default_Ignorable, or Not_NFKC, other Identifier_Type property values listed below that value in Table 1 are not also assigned as additional property values. For example..." 3) "Restricted characters should be treated with caution in registration..." What is "registration" referring to? Perhaps: "Restricted characters should be treated with caution when considering possible use in identifiers..."
Date/Time: Thu Jul 30 16:47:52 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31
This feedback pertains to Revision 33 of UAX#31: http://www.unicode.org/reports/tr31/tr31-33.html In section 1, the fourth paragraph begins, "This annex also provides guidelines for the user of normalization and case insensitivity with identifiers..." This wording with "user" is strange. (Who is a _user_ of normalization with identifiers?) Suggested revision: change "user" to "use".
Date/Time: Thu Jul 30 17:11:37 CDT 2020
Name: Peter Constable
Report Type: Error Report
Opt Subject: feedback on UAX#31
This feedback pertains to revision 33 of UAX#31: http://www.unicode.org/reports/tr31/tr31-33.html There appears to be a wording issue in the first paragraph of section 1.3: "For example, an implementation might display format what the user has entered, but use a normalized format for comparison." The problem wording is "might display format": it appears like a verb phrase ("display-format" being the main verb), but that doesn't really make sense. I gather what was meant is: "For example, an implementation might display what the user has entered, but use a normalized format for comparison." Suggested change: delete the first instance of "format".
Date/Time: Mon Aug 24 02:26:55 CDT 2020
Name: David E Starner
Report Type: Error Report
Opt Subject: Table 22-4. Compatibility Digits is incomplete
Table 22-4. Compatibility Digits on page 831 of the Standard version 13.0.0 is incomplete. It's missing 1FBF0-1FBF9, Segmented Digits, that have font decomposition to the normal ASCII digits.
Date/Time: Mon Aug 24 20:23:52 CDT 2020
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Latest UAX #14 not cleaned up properly
Latest version of UAX #14 at https://unicode.org/reports/tr14/, under Regional Indicator, it says "beginnning" with a yellow background overstruck "n", which means it wasn't cleaned up properly.
Date/Time: Sat Aug 29 16:59:13 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: No real definition of Devanagari cluster pattern
Section 12.1 Devanagari of The Unicode Standard contains various bits of information about the structure of Devanagari clusters, but doesn’t provide a complete pattern for them. There should be one or several regular expressions covering all possible cluster patterns. See https://lindenbergsoftware.com/en/notes/issues-in-devanagari-cluster-validation/ for a detailed discussion of issues and recommendations.
(None at this time.)