The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 11, 2022, since the previous cumulative document was issued prior to UTC #170 (January 2022).
The links below go directly to open PRIs and to feedback documents for them, as of April 11, 2022.
The links below go to locations in this document for feedback.
Feedback routed to CJK & Unihan Group for evaluation [CJK]
Feedback routed to Script ad hoc for evaluation [SAH]
Feedback routed to Properties & Algorithms Group for evaluation [PAG]
Feedback routed to Emoji SC for evaluation [ESC]
Feedback routed to Editorial Committee for evaluation [EDC]
Other Reports
Date/Time: Sun Feb 6 09:46:29 CST 2022
Name: Ken Lunde
Report Type: Error Report
Opt Subject: UAX #45 USourceData.txt errors
Per the USourceData.txt file for Unicode Version 14.0, the following 11 U-Source ideographs have a status value of G, but are not included in Extension G (their code point fields are also blank, which is what flagged them): UTC-01024;G;;79.7;;⿰圼殳;UTCDoc L2/12-333 56;;11;2 UTC-01161;G;;118.10;;⿳𥫗⿰工口木;UTCDoc L2/12-333 193;;16;1 UTC-01166;G;;152.6;;⿳亠䇂豕;UTCDoc L2/12-333 198;;13;4 UTC-01220;G;;32.11;;⿰土畢;UTCDoc L2/15-177 19;;14;2 UTC-01244;G;;85.9;;⿰氵⿳⺊彐龰;UTCDoc L2/15-177 43;;12;2 UTC-01256;G;;85.16;;⿰氵⿵門⿱土必;UTCDoc L2/15-177 55;;19;2 UTC-01257;G;;85.17;;⿰𤀤殳;UTCDoc L2/15-177 56;;20;3 UTC-01272;G;;86.11;;⿱炏冏;UTCDoc L2/15-177 71;;15;4 UTC-01276;G;;86.13;;⿱𤇾冏;UTCDoc L2/15-177 75;;17;4 UTC-01301;G;;167.9;;⿰金朐;UTCDoc L2/15-177 100;;17;3 UTC-01304;G;;167.11;;⿰金⿱谷心;UTCDoc L2/15-177 103;;19;3 I determined that the following six were unified with existing CJK Unified Ideographs per the IRG, and their status and code point fields should therefore be changed as follows: UTC-01024;U;U+6BC0;79.7;;⿰圼殳;UTCDoc L2/12-333 56;;11;2 UTC-01161;U;U+7BC9;118.10;;⿳𥫗⿰工口木;UTCDoc L2/12-333 193;;16;1 UTC-01166;B;U+27C4F;152.6;;⿳亠䇂豕;UTCDoc L2/12-333 198;;13;4 UTC-01220;F;U+2D3EC;32.11;;⿰土畢;UTCDoc L2/15-177 19;;14;2 UTC-01244;B;U+23D8F;85.9;;⿰氵⿳⺊彐龰;UTCDoc L2/15-177 43;;12;2 UTC-01304;B;U+28B02;167.11;;⿰金⿱谷心;UTCDoc L2/15-177 103;;19;3 See: https://hc.jsecs.org/irg/ws2015/app/?find=UTC-01024 https://hc.jsecs.org/irg/ws2015/app/?find=UTC-01161 https://hc.jsecs.org/irg/ws2015/app/?find=UTC-01166 https://hc.jsecs.org/irg/ws2015/app/?find=UTC-01220 https://hc.jsecs.org/irg/ws2015/app/?find=UTC-01244 https://hc.jsecs.org/irg/ws2015/app/?find=UTC-01304 A six-ideograph horizontal extension proposal can therefore be submitted. The remaining five seem to have been withdrawn from IRG Working Set 2015, so I suggest that their status fields be changed to N so that they can be considered for re-submission in the future: UTC-01256;N;;85.16;;⿰氵⿵門⿱土必;UTCDoc L2/15-177 55;;19;2 UTC-01257;N;;85.17;;⿰𤀤殳;UTCDoc L2/15-177 56;;20;3 UTC-01272;N;;86.11;;⿱炏冏;UTCDoc L2/15-177 71;;15;4 UTC-01276;N;;86.13;;⿱𤇾冏;UTCDoc L2/15-177 75;;17;4 UTC-01301;N;;167.9;;⿰金朐;UTCDoc L2/15-177 100;;17;3 That is all.
Date/Time: Sun Feb 13 18:30:55 CST 2022
Name: Paul Masson
Report Type: Error Report
Opt Subject: Unihan
U+4F3C is most commonly pronounced sì, but kMandarin for this character is still given as shì in version 14. Shouldn't this be changed or at least both prounciations given?
Date/Time: Sun Feb 13 18:38:19 CST 2022
Name: Paul Masson
Report Type: Error Report
Opt Subject: Unihan
U+78D7 formerly had a kPhonetic value of 269, which was change in version 14 to 1157*. The character clearly does not belong to this group. I would suggest it be given a kPhonetic of 269* since I cannot locate it in Casey. In fact the entire phonetic group 1157 is far too large compared to Casey. There are 263 characters alone with kPhonetic 1157*. This appears to be a major error for characters not in Casey that were assigned to the same group regardless of phonetics. Someone really needs to figure out when this batch was added and why. Please feel free to follow up with me on phonetic group 1157. Thank you.
Date/Time: Tue Feb 15 23:37:26 CST 2022
Name: Eiso Chan
Report Type: Error Report
Opt Subject: Radical Errors
Please update the kRSUnicode for U+3B3A as below. I have mentioned this issue in IRGN2239. U+3B3A 74.9 U+2D15F is the variant of U+8352 as Moji Joho project shows and it's similar to U+2E3BB, so the best radical should be #140. It's better to change the RS information or add the secondary RS for it. U+2D15F 140.6 or U+2D15F 23.7 140.6
Date/Time: Sat Mar 12 04:49:28 CST 2022
Name: Edward
Report Type: Error Report
Opt Subject:
I found out an issue in Unihan Database.Some kTotalStrokes values of the characters with the radical 邑 or 阜 may be wrong.For example,kTotalStrokes value of U+2B545 𫕅 is 10,while U+2CBC0 𬯀 is 9.The radical 阝has 2 strokes in the blocks from CJKUI to CJK-ExtD,while it has 3 strokes in the blocks from CJK-ExtE to CJK-ExtG.I wonder whether this is wrong.In the other words,the stroke of 阝is 3 since Unicode® 8.0.0 was published. That's all.
Date/Time: Tue Mar 15 05:51:07 CDT 2022
Name: Andrew West
Report Type: Error Report
Opt Subject: CJK Ext B code chart
There are two Vietnam ideographs with identical shape but different source references for two different CJK unified ideographs: VN-058B6 at U+58B6 is ⿰土達; G and H glyphs are also ⿰土達 VN-2143F at U+2143F is also ⿰土達; but G and H glyphs are ⿰土逹 (one stroke less) I think the V source glyph for U+2143F should be modified to match the G and H glyphs for U+2143F (and to distinguish it from the V glyph for U+58B6).
Date/Time: Mon Apr 11 09:24:29 CDT 2022
Name: Jaycee Carter
Report Type: Error Report
Opt Subject: Unihan_IRGSources.txt and CJK Unified Ideographs code chart
This is to report an error relating to CJK character stroke counts: U+5954: kRSUnicode is currently 37.6. This should be 37.5. U+595F: kRSUnicode is currently 37.9. This should be 37.8. kTotalStrokes is correct for both characters.
Date/Time: Wed Jan 26 10:38:02 CST 2022
Name: Halbast Abdullah
Report Type: Other Document Submission
Opt Subject: Kurdish language problems with the Arabic Script
Hi, I wanted to comment on the Arabic Script in Unicode, Central Kurdish uses the Arabic Script and there's a problem, we have words that have للە in them, and you can already see that it automatically makes it (Allah) in Arabic, words like (گوللە، کەللە، کوللە) in Kurdish, that have nothing to do with (Allah), are messed up because of this automatic change, the words should be written without the Shaddah and the little Elif. One solutions is if you let us choose if we want it to be للە with the Shaddah and Elif or not. We, as we I mean the Central Kurdish language, would appreciate if you can review this and fix it. Thanks!
Date/Time: Mon Feb 7 15:25:04 CST 2022
Name: Elango Cheran
Report Type: Error Report
Opt Subject: ScriptExtensions.txt
I am a speaker of Tamil, and I notice that the data for the Script_Extensions property marks both danda and double danda (U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA) as having the `Taml` script code in the extensions. I have consulted with people in the community, and none of us have ever observed or are aware of any use cases (neither modern uses nor otherwise) that use these characters. If indeed there are no documented usages, then the above association in ScriptExtensions.txt would be a bug.
Date/Time: Thu Mar 3 19:17:44 CST 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Unhelpful advice about U+0F35 and U+0F37
Re the Tibetan marks U+0F35 and U+0F37, chapter 13 says “If they are treated as normal combining marks, they can be entered into the text following the vowel signs in a stack”. Should they be treated as normal combining marks? If not, where should they appear in a stack? The standard should clearly specify how to use these code points, and not give such diffident advice.
Date/Time: Thu Mar 3 19:44:29 CST 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: How to encode multiple Tibetan vowels at the same height?
Section 13.4 discusses Tibetan stacks with multiple vowel signs. Usually, when there are multiple vowel signs above the base, they are rendered from bottom to top. In what order should they be encoded when they are rendered side by side? In particular, which of <U+0F68, U+0FA0, U+0F80, U+0F72> and <U+0F68, U+0FA0, U+0F72, U+0F80> is the right encoding for the stack with U+0F80 to the left of U+0F72?
Date/Time: Sun Mar 13 16:30:37 CDT 2022
Name: David Corbett
Report Type: Error Report
Opt Subject: IndicPositionalCategory.txt
The Kayah Li vowel signs U+A926..U+A92A should have Indic_Positional_Category = Top.
Date/Time: Wed Mar 16 13:13:48 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Order of Indic cantillation marks
What should the relative order be between above- and below-base marks with Indic_Syllabic_Category=Cantillation_Mark and post-base marks like visarga? Microsoft’s Indic shaper expects such Vedic marks at the end of the cluster, but Microsoft’s USE expects them to be mixed in with other marks, meaning they precede post-base marks.
Date/Time: Sun Apr 10 09:14:28 CDT 2022
Name: Tuğrul Çavdar
Report Type: Other Question
Opt Subject: Regarding to “2021-04-21 Application for Adding Letters to Old Turkish Alphabet — Gökbey Uluç”
Dear Unicode Consortium, Regarding to “2021-04-21 Application for Adding Letters to Old Turkish Alphabet — Gökbey Uluç”: https://www.unicode.org/L2/L2021/21081-old-turkish-add.pdf Old Turkic alphabet does not have letters corresponding to F, H, V, J, C consonants because there was none of these sounds in the era when Old Turkic was used (before A. D. 840). Also “O/U” vowels are written in same letter: 𐰆 , as “Ö/Ü” in same letter: 𐰇 as well. The current letters of Old Turkic table defined in https://unicode.org/charts/PDF/U10C00.pdf are correct. Gökbey has fabricated new letters for F, H, V, J, C on his own to write today’s Turkish with Old Turkic letters. He also use “10C0A 𐰊 OLD TURKIC LETTER YENISEI AB” letter for “O” vowel, “uş/ush” letter (one of two letters in his proposal) for “Ö” vowel. He also use “10C06 𐰆 OLD TURKIC LETTER ORKHON O/U” letter for “U” vowel only, and “10C07 𐰇 OLD TURKIC LETTER ORKHON OE/UE” letter for “Ü” vowel only. His fabricated alphabet is http://2.bp.blogspot.com/-RGZcnZW6bos/VISzObbuwpI/AAAAAAAACeM/0mnoucBfr30/s1600/cagdas-turk-damgalari.png (from his blog: http://kokturukce.blogspot.com/2011/04/yeni-damgalar-yeni-yaz-duzeni.html) The reason why he has proposed these two letters is to use for “Ö” and “AH” instead of using for “UŞ/USH” and “IÇ/ICH”. So, he plans to use his fabricated alphabet in digital platforms. There are many variations of Old Turkic letters as can be seen in: http://www.tamga.org/2014/12/farkl-dillerdeki-kitablarda-kokturuk.html It is impossible to produce codes for all variations. The current letters in Old Turkic table of Unicode.Org are correct and the direction of the current iç/ich: 𐰱 is correct. For your information. Yours sincerely, Assoc. Prof. Tuğrul Çavdar, Ph. D. Karadeniz Technical University Trabzon, Turkey
Date/Time: Wed Jan 19 03:46:02 CST 2022
Name: Reini Urban
Report Type: Error Report
Opt Subject: tr31-latest
TR31 Security Bugs (UCD Versions 1-14) ====================================== 1. U+FF00..U+FFEF not as ID --------------------------- Most of the U+FF00..U+FFEF Full and Halfwidth letters have incorrectly `ID_Start` resp. `ID_Continue` properties. XID ditto. They should not, because they are confusable with the normal characters in the base planes. E.g. LATIN A-Z are indistuingishable from A..Z, LATIN a-z from a..z, likewise for the Katakana ヲ..ッ and ア..ン, and the Hangul ᅠ..ᄒ, ᅡ..ᅦ, ᅧ..ᅬ, ᅭ..ᅲ ᅳ..ᅵ halfwidth letters. This is esp. for TR39 a security risk. TR39 provides Identifier Type properties to exclude insecure identifiers, but I cannot find any other type property to set these U+FF21..U+FFDC IDs to, than `Not_XID`. Thus the `ID_Start`/`ID_Continue` property should be deleted for all of them. If they are not identifiable, they should not be marked as such. Since XID's are guaranteed stable and nobody cares yet about TR39, I would accept a new TR39 Identifier Type property Confusable, or just set the Not_XID property there for these. But really, defects, esp. security defects should be fixed. 2. Medial letters in `ID_Start`, not `ID_Continue` -------------------------------------------------- DerivedCoreProperties lists all of the Arabic and Thai MEDIAL letters, which are part of identifiers in `ID_Start`, not in `ID Continue`. Only the Combining marks are in `ID_Continue`. Thus all unicode-aware parsers accept all MEDIAL letters incorrectly in the start position. They should only be allowed in the `ID_Continue` position, and parsers should disallow them in the end positions for identifiers. All the other medial letters (Myanmar, Canadian Aboriginal, Ahom, Dives Akuru) are not part of Recommended Scripts, so they do not affect TR39 security. But since almost nobody but Java, cperl and Rust honor TR39 it's still affecting most parsers. Other medial exceptions are noted in TR31 at 2.4 Specific Character Adjustments, but the tables DerivedCoreProperties and TR39 Identifier tables and thus all user parsers are wrong. < https://www.unicode.org/reports/tr31/#Specific_Character_Adjustments >
Date/Time: Sat Feb 19 16:32:04 CST 2022
Name: Karl Williamson
Report Type: Error Report
Opt Subject: UCD
U+1F8A0: LEFTWARDS BOTTOM-SHADED WHITE ARROW 🢠 U+1F8A1: RIGHTWARDS BOTTOM SHADED WHITE ARROW 🢡 While not technically an error, the names of these symmetric characters are asymmetric. One has a HYPHEN-MINUs between BOTTOM and SHADED and the other has a SPACE. It would be helpful to add a Name Alias to one or the other
Date/Time: Thu Feb 24 03:26:53 CST 2022
Name: Martin J. Dürst
Report Type: Website Problem
Opt Subject: Case Charts
In the case charts at https://www.unicode.org/charts/case/, together with Yusuke Endoh, a fellow Ruby committer, I discovered a problem: It lists the lowercased version of U+0130 (İ) as U+0069 (i). This is a simple case mapping, the full case mapping is U+0069 U+0307 at https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt, line 69. The case charts don't say anything about simple vs. full case mappings, they should say something (it's unclear for me at the moment exactly what they should say, because it's unclear to me exactly what they do).
Date/Time: Mon Mar 28 18:33:50 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Vai line breaking
This is feedback on L2/22-080. Another script with line breaks between orthographic syllables is Vai. The description in chapter 19 indicates that most Vai letters should have lb=ID, and U+A60B and U+A60C should have lb=BA. The “h-” characters might be ID or BA.
Date/Time: Sun Mar 6 21:15:52 CST 2022
Name: Fake Unicode
Report Type: Error Report
Opt Subject: emoji-list.html & emoji-test.txt
Per [https://twitter.com/roozbehp/status/1500663503316602882] It would be better to categorize the emoji 🍄 1f344 MUSHROOM under subcategory "plant-other" rather than as "food-vegetable", since all vendors show it as an inedible poisonous toadstool [ref: https://emojipedia.org/mushroom/].
Date/Time: Sat Apr 9 07:14:54 CDT 2022
Name: Matthias Reitinger
Report Type: Error Report
Opt Subject: emoji-test.txt
The data file https://www.unicode.org/Public/emoji/14.0/emoji-test.txt (Date: 2021-08-26, 17:22:23 GMT) contains these 10 code point sequences with the status "unqualified": 1F441 FE0F 200D 1F5E8 ; unqualified # 👁️🗨 E2.0 eye in speech bubble 1F575 FE0F 200D 2642 ; unqualified # 🕵️♂ E4.0 man detective 1F575 FE0F 200D 2640 ; unqualified # 🕵️♀ E4.0 woman detective 1F3CC FE0F 200D 2642 ; unqualified # 🏌️♂ E4.0 man golfing 1F3CC FE0F 200D 2640 ; unqualified # 🏌️♀ E4.0 woman golfing 26F9 FE0F 200D 2642 ; unqualified # ⛹️♂ E4.0 man bouncing ball 26F9 FE0F 200D 2640 ; unqualified # ⛹️♀ E4.0 woman bouncing ball 1F3CB FE0F 200D 2642 ; unqualified # 🏋️♂ E4.0 man lifting weights 1F3CB FE0F 200D 2640 ; unqualified # 🏋️♀ E4.0 woman lifting weights 1F3F3 FE0F 200D 26A7 ; unqualified # 🏳️⚧ E13.0 transgender flag I believe these code point sequences should be "minimally-qualified" instead. The Unicode® Technical Standard #51 Revision 21 <https://www.unicode.org/reports/tr51/tr51-21.html> defines these terms: > ED-17a. qualified emoji character — An emoji character in a string that (a) has default emoji presentation or (b) is the first character in an emoji modifier sequence or (c) is not a default emoji presentation character, but is the first character in an emoji presentation sequence. > ED-18. fully-qualified emoji — A qualified emoji character, or an emoji sequence in which each emoji character is qualified. > ED-18a. minimally-qualified emoji — An emoji sequence in which the first character is qualified but the sequence is not fully qualified. > ED-19. unqualified emoji — An emoji that is neither fully-qualified nor minimally qualified. Each of the sequences in question is an emoji zwj sequence (ED-16) with two elements. The first element of each sequence is an emoji presentation sequence (ED-9a), where the emoji character is not a default emoji presentation character. Therefore the first character is a qualified emoji character according to ED-17a (c). The second element of each sequence is a single emoji character that does not have default emoji presentation. It is therefore not a qualified emoji character according to ED-17a. So the emoji zwj sequence contains one qualified emoji character (the first emoji character) and one non-qualified emoji character (the second emoji character). According to ED-18 the sequence is not a fully-qualified emoji, because not every emoji is qualified. But the sequence is minimally-qualified according to ED-18a, as the first emoji character is qualified, but the sequence is not fully-qualified. Therefore the listed sequences should be marked as minimally-qualified in the emoji-test.txt data file.
Date/Time: Mon Jan 17 01:46:41 CST 2022
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Core Specification
Note: This report has already been addressed by the Editorial Committee in a 15.0 draft.
Section 24.1 of the Core Specification, Character Names List, describes the Dashed Box Convention: "DashedBoxConvention. There are a number of characters in the Unicode Standard which in normal text rendering have no visible display, or whose only effect is to modify the display of other characters in proximity to them." Since Unicode 6.0, the dashed box convention has also been applied to characters with Indic syllabic category Consonant_Preceding_Repha. Such characters are always rendered visibly; the dashed box is used to indicate that they require reordering to after the following base character.
Date/Time: Thu Jan 27 15:49:01 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UAX #24 and UAX #31
UAX #24 contains the mistakes “GREEK LETTER SMALL LETTER OMICRON” (instead of “GREEK SMALL LETTER OMICRON”) and “in provided in” (instead of “is provided in”). A period is missing after “can be classified by script”. UAX #31 contains “an definition” (instead of “a definition”) and possibly some misplaced spaces (search for “ ,” and “ .”).
Date/Time: Tue Feb 1 01:51:59 CST 2022
Name: Vikki McDonough
Report Type: Error Report
Opt Subject: Unicode 14.0 "Optical Character Recognition" code chart
In the code chart for the Optical Character Recognition block, the reference glyph for character U+2447, OCR AMOUNT OF CHECK, is misshapen. The vertical bar in the middle of the glyph should be centered vertically; if we take the lower-left rectangle as glyph-component A, the vertical bar as glyph-component B, and the upper-right rectangle as glyph-component C, and designate the height of the upper and lower edges of each component as hU( [A/B/C]) and hL([A/B/C]), respectively, {hU(A)-hL(B)} should equal {hU (B)-hL(C)}. However, in the reference glyph for this character in the official Optical Character Recognition code chart, the vertical bar is too high up, and {hU(B)-hL(C)} is much greater than {hU(A)-hL(B)}. This error has been present since at least Unicode 3.0 (the earliest Unicode version for which an archived copy of the Optical Character Recognition code chart is retrievable from the Wayback Machine). Code chart containing the error: https://www.unicode.org/charts/PDF/U2440.pdf ("Optical Character Recognition; Range: 2440–245F") Archived Unicode 3.0 code chart demonstrating a lower bound on the length of time this error has been present: https://web.archive.org/web/20010603000706/http://www.unicode.org/charts/PDF/U2440.pdf Example of an E-13B-based font showing the correct form of this glyph: https://commons.wikimedia.org/wiki/File:MICR_char.svg (high-resolution version: https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MICR_char.svg/2560px-MICR_char.svg.png)
Date/Time: Sat Feb 12 13:44:53 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTR #17
I suggest the following corrections in UTR #17: "O'Reilley" → "O'Reilly" "graphic character glyphic identifier" → "graphic character global identifier" "Graphic Character Set Glyphic Identifier" → "Graphic Character Global Identifier" "UTF32-LE" → "UTF-32LE" "an single" → "a single" "sets, where for example," → "sets where, for example," "UTF-16 ," → "UTF-16," "(“character set” )" → "(“character set”)" "3.0,..." → "3.0, ..." "CCS's" → "CCSes" "UDC's" → "UDCs" "UAX# 29" → "UAX #29" "Compression. [BOCU]." → "Compression [BOCU]."
Date/Time: Sun Feb 13 10:55:14 CST 2022
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: UTR #23
UTR #23 contains the following minor mistakes: “An code” (instead of “A code”), “comparsion” (instead of “comparison”), “applies as” (instead of “apply as”), “an encoded characters” (instead of “an encoded character”), “properties the” (instead of “properties of the”), “Unicode Character database” (instead of “Unicode Character Database”), “For example 'Character Property', becomes” (instead of “For example, 'Character Property' becomes”). A space is missing in “results.Proceeding”. I also suggest changing “values, (other than the default value)” to “values (other than the default value),”. The comma here can be deleted: “accessed,”, “a property, with”, “way, is”, “input, is”, “for, is”.
Date/Time: Thu Mar 3 20:55:52 CST 2022
Name: David Corbett
Report Type: Error Report
Opt Subject: Chapter 9
The glyphs for positional forms of U+0886 ARABIC LETTER THIN YEH in chapter 9 look identical to those for U+064A ARABIC LETTER YEH. They should be thin.
Date/Time: Thu Mar 17 19:31:23 CDT 2022
Name: Martin J. Dürst
Report Type: Error Report
Opt Subject: Unicode version 14.0.0, section 5.4
This is not really an error, but a place where language could be improved. Section 5.4 of Unicode 14.0.0 (https://www.unicode.org/versions/Unicode14.0.0/ch05.pdf) contains the following: ``` Because the ranges are disjoint, each code unit in well-formed UTF-16 must meet one of only three possible conditions: • A single non-surrogate code unit, representing a code point between 0 and D7FF16 or between E00016 and FFFF16 • A leading surrogate, representing the first part of a surrogate pair • A trailing surrogate, representing the second part of a surrogate pair ``` The wording here is a bit strange. "Condition" seems to require "It is ..." in each of the bulleted items. Either add "It is " to each bullet, or change the preceding text to say "it is one of the following three".
Date/Time: Fri Mar 18 20:28:10 CDT 2022
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: 442
On the codecharts for Cyrillic Extended-D some of the characters use the Greek letterforms (of Delta and Phi respectively) rather than the Cyrillic ones (of be and ef respectively). These are: 1E031, 1E042, 1E052 & 1E060. The latter two are just the subscript version of the former two, with the same issue.
Date/Time: Sun Apr 10 08:59:51 CDT 2022
Name: David Corbett
Report Type: Other Document Submission
Opt Subject: Armenian left half ring
Section 7.6 “Armenian” says “There is no left half ring in Armenian. Unicode character U+0559 is not used. It appears that this character is a duplicate character, which was encoded to represent U+02BB MODIFIER LETTER TURNED COMMA, used in Armenian transliteration. U+02BB is preferred for this purpose.” Via https://en.wiktionary.org/wiki/%D5%99 I found http://www.nayiri.com/imagedBook.jsp?id=1&printPage=10 which shows a left half ring (or turned apostrophe) being used in the Armenian script in a book on Armenian dialects. Should this character be encoded as U+0559 or as U+02BB? The standard should explain which to use in the Armenian script, because the standard is currently wrong or at least misleading.
Date/Time: Mon Apr 11 17:49:00 CDT 2022
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: IndicSyllabicCategory.txt
The file IndicSyllabicCategory.txt has a category Brahmi_Joining_Number, which contains only the Brahmi numbers U+11052..U+11065. The documentation for that category in the same file says "similar to Number in that in can be used as vowel-holders like Consonant_Placeholder, but may also be joined by a Number_Joiner of the same script, e.g. in Brahmi". This contradicts the core specification, section 14.1, which says "the numerals U+11052 brahmi number one through U+11065 brahmi number one thousand and their ligatures formed with U+1107F brahmi number joiner are not used as vowel carriers".
(None at this time.)