This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Tue Jun 8 11:08:50 CDT 2021
Name: Mike FABIAN
Report Type: Error Report
Opt Subject: Bugs in Unihan_Variants.txt (Unicode 14 draft version)
I think Unihan_Variants.txt from https://www.unicode.org/Public/14.0.0/ucd/Unihan-14.0.0d4.zip still has bugs for the following characters: 乾 U+4E7E, 著 U+8457, 覆 U+8986 are used not only use traditional Chinese but also in simplified Chinese, 杰 U+6770, 系 U+7CFB, 只 U+53EA are used not only in simplified Chinese but also in traditional Chinese. The Chinese input method ibus-table (which I maintain) has options to show only simplified, only traditional, simplified first, traditional first, or both in no particular order. To figure out which characters are simplified, traditional, or both, ibus-table uses Unihan_Variants.txt. Therefore, I still apply these two patches to Unihan_Variants.txt: https://github.com/mike-fabian/ibus-table/commit/c37452a7bf49ccfd0f2062a3e90085022cb3735b https://github.com/mike-fabian/ibus-table/commit/6384e217f7283d12b1bdabbc10da4d2cf2e4f94a $ git show c37452a7bf49ccfd0f2062a3e90085022cb3735b commit c37452a7bf49ccfd0f2062a3e90085022cb3735b Author: Mike FABIAN Date: Tue Jun 8 13:58:22 2021 +0200 Keep our fixes to Unihan_Variants which are not yet included upstream diff --git a/tools/Unihan_Variants.txt b/tools/Unihan_Variants.txt index 72be50f..a29e627 100644 --- a/tools/Unihan_Variants.txt +++ b/tools/Unihan_Variants.txt @@ -1015,7 +1015,7 @@ U+4E66 kTraditionalVariant U+66F8 U+4E70 kTraditionalVariant U+8CB7 U+4E71 kSemanticVariant U+4E82<kMatthews,kMeyerWempe U+4E71 kTraditionalVariant U+4E82 -U+4E7E kSimplifiedVariant U+5E72 +U+4E7E kSimplifiedVariant U+4E7E U+5E72 U+4E7E kSpecializedSemanticVariant U+4E81<kFenn U+4E81 kSpecializedSemanticVariant U+4E7E<kFenn U+4E82 kSemanticVariant U+4E71<kMatthews,kMeyerWempe @@ -3814,7 +3814,7 @@ U+6769 kTraditionalVariant U+69AA U+676E kSpoofingVariant U+67FF U+676F kSemanticVariant U+76C3<kLau,kMatthews,kMeyerWempe U+6770 kSemanticVariant U+5091<kMatthews -U+6770 kTraditionalVariant U+5091 +U+6770 kTraditionalVariant U+6770 U+5091 U+6771 kSemanticVariant U+4E1C<kFenn U+6771 kSimplifiedVariant U+4E1C U+6777 kSemanticVariant U+6733<kMatthews @@ -5912,7 +5912,7 @@ U+7CF9 kSemanticVariant U+7CF8<kMatthews U+7CF9 kSimplifiedVariant U+7E9F U+7CFA kSemanticVariant U+7CFE<kMatthews,kMeyerWempe U+7CFA kSimplifiedVariant U+2B119 -U+7CFB kTraditionalVariant U+4FC2 U+7E6B +U+7CFB kTraditionalVariant U+7CFB U+4FC2 U+7E6B U+7CFD kSimplifiedVariant U+30AFC U+7CFE kSemanticVariant U+7CFA<kMatthews,kMeyerWempe U+7CFE kSimplifiedVariant U+7EA0 @@ -6938,7 +6938,7 @@ U+8441 kSemanticVariant U+8591<kMatthews U+8449 kSimplifiedVariant U+53F6 U+8452 kSimplifiedVariant U+836D U+8457 kSemanticVariant U+7740 -U+8457 kSimplifiedVariant U+7740 +U+8457 kSimplifiedVariant U+8457 U+7740 U+8457 kSpecializedSemanticVariant U+7740<kFenn U+87AB<kFenn U+845A kSemanticVariant U+6939<kFenn U+845D kSimplifiedVariant U+2B20E @@ -7475,7 +7475,7 @@ U+8975 kSimplifiedVariant U+2B307 U+8978 kSimplifiedVariant U+2C877 U+8979 kSimplifiedVariant U+30CFC U+897C kSimplifiedVariant U+30CF5 -U+8986 kSimplifiedVariant U+590D +U+8986 kSimplifiedVariant U+590D U+8986 U+8987 kSemanticVariant U+9738<kMeyerWempe U+898A kSemanticVariant U+7F88<kMatthews U+898B kSimplifiedVariant U+89C1 $ $ git show 6384e217f7283d12b1bdabbc10da4d2cf2e4f94a commit 6384e217f7283d12b1bdabbc10da4d2cf2e4f94a Author: Mike FABIAN Date: Tue Jun 8 13:59:59 2021 +0200 Fix bug in Unihan_Variants.txt, 只 U+53EA is both simplified *and* traditional Chinese Resolves: https://github.com/kaio/ibus-table/issues/74 diff --git a/engine/chinese_variants.py b/engine/chinese_variants.py index e53e6d6..0b79eea 100644 --- a/engine/chinese_variants.py +++ b/engine/chinese_variants.py @@ -1116,7 +1116,7 @@ VARIANTS_TABLE = { u'叙': 1, u'叠': 1, u'叢': 2, - u'只': 1, + u'只': 3, u'台': 3, u'叶': 1, u'号': 1, diff --git a/tools/Unihan_Variants.txt b/tools/Unihan_Variants.txt index a29e627..ab84494 100644 --- a/tools/Unihan_Variants.txt +++ b/tools/Unihan_Variants.txt @@ -1695,7 +1695,7 @@ U+53E2 kSemanticVariant U+6A37<kFenn U+53E2 kSimplifiedVariant U+4E1B U+53E8 kSemanticVariant U+9955<kMeyerWempe U+53EA kSemanticVariant U+5B50<kLau -U+53EA kTraditionalVariant U+96BB +U+53EA kTraditionalVariant U+53EA U+96BB U+53EB kSemanticVariant U+544C<kLau,kMatthews,kMeyerWempe U+53F0 kSemanticVariant U+81FA<kHKGlyph,kLau U+53F0 kSimplifiedVariant U+53F0 diff --git a/tools/generate-chinese-variants.py b/tools/generate-chinese-variants.py index 391b5ef..310d6ae 100755 --- a/tools/generate-chinese-variants.py +++ b/tools/generate-chinese-variants.py @@ -273,6 +273,7 @@ TEST_DATA = { u'系': 3, # U+7CFB u'乾': 3, # U+4E7E u'著': 3, # U+8457 Patch by Heiher <r@hev.cc> + u'只': 3, # U+53EA, see: https://github.com/kaio/ibus-table/issues/74 } def test_detection(generated_script) -> int: $
Date/Time: Tue Jun 8 23:34:03 CDT 2021
Name: Yi Bai
Report Type: Error Report
Opt Subject: Glyph missing in code chart of CJK Unified Ideographs Extension A
Note: This glyph error will be corrected by the chart editors and will be posted.
In code chart of CJK Unified Ideographs Extension A in Version 14.0 Beta, glyph of 4DB9 with UTC source UTC-00120 is missing. The glyph can be found in current 13.0 code chart. Please update the code chart accordingly, thank you.
Date/Time: Thu Jun 10 09:18:03 CDT 2021
Name: Andrew Christopher West
Report Type: Public Review Issue
Opt Subject: PRI #433 Unicode 14.0.0 Beta
U+2B8D9 and U+2B8DA have the wrong radical and stroke count since the glyph changes for Unicode 13.0 simplified 來 to 来 in both cases (in retrospect this was a destabilizing glyph change, and encoding two new characters would have been a better solution). The code charts for Unicode 14.0 still show the old radical stroke count of 9.12, but as the new glyph forms no longer include the 'person' radical (9) they must be assigned to a new radical. I suggest 'wood' (75) as this is the radical for 来, which would give 75.9 for both characters.
Date/Time: Thu Jun 10 09:48:33 CDT 2021
Name: Andrew Christopher West
Report Type: Public Review Issue
Opt Subject: PRI #433 Unicode 14.0.0 Beta
Following glyph changes in Unicode 11.0, the following characters have the wrong radical and stroke counts: U+2B1CD: 132.20 -- should be 132.19 U+2B584: 176.12 -- should be 176.11 U+2B8DE: 9.12 -- should be 9.11 U+2C7C3: 140.15 -- should be 140.14 Following glyph changes in Unicode 13.0, the following characters have the wrong radical and stroke counts: U+2BD61: 44.10 -- should be 44.9 U+2BE4A: 59.14 -- should be 59.15 U+2BF9D: 64.18 -- should be 64.19 U+2C0B8: 75.7 -- should be 75.8 U+2C142: 75.15 -- should be 75.16 U+2C316: 91.17 -- should be 91.19 U+2C83A: 142.17 -- should be 142.18 U+2CC88: 182'.13 -- should be 182'.16
Date/Time: Fri Jun 11 12:50:02 CDT 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #433: Other_Lowercase Property for New Modifier Letters
The following characters in the new Latin Extended-F block should be given the property Other_Lowercase=True for consistency with other similar modifier letters already encoded: U+10780 (MODIFIER LETTER SMALL CAPITAL AA) U+10783..U+10785 (MODIFIER LETTER SMALL AE..MODIFIER LETTER SMALL B WITH HOOK) U+10787..U+107B0 (MODIFIER LETTER SMALL DZ DIGRAPH..MODIFIER LETTER SMALL V WITH RIGHT HOOK) U+107B2..U+107B5 (MODIFIER LETTER SMALL CAPITAL Y..MODIFIER LETTER BILABIAL CLICK) U+107BA (MODIFIER LETTER SMALL S WITH CURL) It is unclear whether the following characters should be classified as lowercase as well since their base forms are Other_Letter rather than Lowercase_Letter: U+10781..U+10782 (MODIFIER LETTER SUPERSCRIPT TRIANGULAR COLON..MODIFIER LETTER SUPERSCRIPT HALF TRIANGULAR COLON) U+107B6..U+107B9 (MODIFIER LETTER DENTAL CLICK..MODIFIER LETTER RETROFLEX CLICK WITH RETROFLEX HOOK)
Date/Time: Mon Jun 14 09:03:19 CDT 2021
Name: Mike FABIAN
Report Type: Error Report
Opt Subject: Some more possible bugs in Unihan_Variants.txt
Recently I reported some problems in Unihan_Variants.txt from https://www.unicode.org/Public/14.0.0/ucd/Unihan-14.0.0d4.zip I received this bug report about ibus-table classifying some characters wrongly as simplified only or traditional only: https://github.com/ibus/ibus/issues/2323 According to this bug report there maybe a few more bugs concerning kTraditionalVariant and kSimplifiedVariant. In the above bug report, the user said that 着 U+7740 is used in Hong Kong, 云 U+4E91 is used both in Hong Kong and Taiwan, 裡 U+88E1 and 復 U+5FA9 are used everywhere (i.e. also in simplified Chinese), 采 U+91C7 is used in Hong Kong (The user wrote he was nor sure about Taiwan, but probably it iis used in Taiwan as well as it is listed in http://dict.revised.moe.edu.tw/cgi-bin/cbdic/gsweb.cgi ), 吓 U+5413 is used in Cantonese, 揾 U+63FE is used in Hong Kong. The user wrote he doesn’t know where 尸 is used but it is one of the radicals used on a Cangjie keyboard, so it seems to be used in traditonal Chinese, at least as a radical. I fixed it like this: https://github.com/mike-fabian/ibus-table/commit/5ed1cc16b398e0161e63ef35421d94166caf56c0 I.e. I applied the following changes to Unihan_Variants.txt: -U+4E91 kTraditionalVariant U+96F2 +U+4E91 kTraditionalVariant U+4E91 U+96F2 -U+5413 kTraditionalVariant U+5687 +U+5413 kTraditionalVariant U+5413 U+5687 -U+5C38 kTraditionalVariant U+5C4D +U+5C38 kTraditionalVariant U+5C38 U+5C4D -U+5FA9 kSimplifiedVariant U+590D +U+5FA9 kSimplifiedVariant U+590D U+5FA9 -U+63FE kTraditionalVariant U+6435 +U+63FE kTraditionalVariant U+63FE U+6435 -U+7740 kTraditionalVariant U+8457 +U+7740 kTraditionalVariant U+7740 U+8457 -U+88E1 kSimplifiedVariant U+91CC +U+88E1 kSimplifiedVariant U+88E1 U+91CC -U+91C7 kTraditionalVariant U+57F0 U+63A1 +U+91C7 kTraditionalVariant U+57F0 U+63A1 U+91C7
Date/Time: Tue Jun 15 06:29:07 CDT 2021
Name: M
Report Type: Other Question, Problem, or Feedback
Opt Subject: FEEDBACK ABOUT UNIHAN DATABASE
1. feedback: [乹][U+4E7E] is a variant of [乾][U+4E7E], [亁][U+4E81] according to the variant list (第一批异体字整理表) https://upload.wikimedia.org/wikipedia/commons/2/29/%E7%AC%AC%E4%B8%80%E6%89%B9%E5%BC%82%E4%BD%93%E5%AD%97%E6%95%B4%E7%90%86%E8%A1%A8.pdf page 4 2. question: [卿][U+537F] kRSUnicode 26.9 kTotalStrokes 10 This doesn't make sense. SHOULD BE kRSUnicode 26.8 kTotalStrokes 10 OR kRSUnicode 26.9 kTotalStrokes 11
Date/Time: Thu Jun 17 07:59:42 CDT 2021
Name: Ken Lunde
Report Type: Error Report
Opt Subject: 21 CJK Unified Ideographs are missing kTotalStrokes property values
The following 21 CJK Unified Ideographs, in the range U+9FD6 through U+9FEA that were adding in Unicode Version 10.0, are missing kTotalStrokes property values, and the suggested property values are provided below: U+9FD6 kTotalStrokes 7 U+9FD7 kTotalStrokes 9 U+9FD8 kTotalStrokes 12 U+9FD9 kTotalStrokes 13 U+9FDA kTotalStrokes 16 U+9FDB kTotalStrokes 24 U+9FDC kTotalStrokes 22 U+9FDD kTotalStrokes 27 U+9FDE kTotalStrokes 20 U+9FDF kTotalStrokes 12 U+9FE0 kTotalStrokes 21 U+9FE1 kTotalStrokes 33 U+9FE2 kTotalStrokes 14 U+9FE3 kTotalStrokes 15 U+9FE4 kTotalStrokes 18 U+9FE5 kTotalStrokes 22 U+9FE6 kTotalStrokes 23 U+9FE7 kTotalStrokes 25 U+9FE8 kTotalStrokes 27 U+9FE9 kTotalStrokes 29 U+9FEA kTotalStrokes 17
Date/Time: Fri Jun 18 01:50:26 CDT 2021
Name: Lim Hian-tong
Report Type: Public Review Issue
Opt Subject: Issues related to Kana Extended-B (Public Review Issue #433)
This is a feedback on Unicode 14.0.0 Beta. I refer to Public Review Issue #433. I am writing to request amendments of the code chart for Kana Extended-B, as shown in the current beta draft of the Unicode Standard, Version 14.0. The descriptions of U+1AFF0 and U+1AFF8 (“also used for tone six”) should be removed for the following reasons. The current descriptions come from document L2/20-209R (titled “Final proposal to encode Taiwanese kana in the UCS”) by Fredrick R. Brennan. In the document, the sentence stating “In modern Hokkien, tone six is equal to tone two” is inconsistent with the source text (Chiung), which says “It has been observed that tone 6 had merged with tone 2 or tone 7” instead. What’s more, what Chiung has written is also a misquote from Ang Ui-jin’s “The tonal study of Taiwanese,” which compares Minnan tones with Middle Chinese ones. Ang does not claim that “tone 6 (of the Minnan language) has merged with tone 2 or tone 7.” The complicated tone situation originates from the fact that Hokkien consists of two major dialects, namely Quanzhou and Zhangzhou, with different phonologies. The two major dialects, along with dialects descended from them, are spoken in the PRC, in Taiwan and across Southeast Asia. Quanzhou speakers make a clear distinction between tones 6 and 7 (up until today), while Zhangzhou speakers merge them and assign the merged tone as “tone 7,” removing “tone 6” from their phonology. Every single word with tone 6 in Quanzhou is pronounced with tone 7 in Zhangzhou due to the merge. This can be observed on the Facebook page “Taigikho,” where all words with tone 6 (marked with a caron) are given as variants of tone 7 (marked with a macron). Detailed explanations of Quanzhou phonology can be found in various publications in the PRC, including dictionaries, chorographies and periodicals. However, certain scholars who only speak the Zhangzhou dialect would attempt to fill the gap of the non-existent “tone 6” with what they guess the “assumed historical tone 6” was in the past. This has resulted in a misconception among folks that the “historical tone 6” in Zhangzhou somehow became tone 2. Since the Quanzhou tonal system is less common and is not widely studied in Taiwan, Taiwanese publications tend to adopt the linguistically incorrect theory. Aside from simply giving the idea, these publications are not able to list actual examples to support the misconception because of the impossibility to do so. Of the printed materials in Taiwan, only scattered studies involving Quanzhou phonology contain correct linguistic information regarding tones. Correct descriptions and usage of tone 6 can also be found in the official “Dictionary of Frequently-Used Taiwan Minnan” by the ROC’s Ministry of Education and other online resources by Taiwanese researchers who are familiar with dialectal differences. In short, tone 2 has absolutely nothing to do with tone 6. “Also used for tone six” should be considered an inappropriate description for both “katakana letter Minnan tone-2” and “katakana letter Minnan nasalized tone-2.” By removing such descriptions, the Unicode documentation will be much less likely to cause confusions, misunderstandings and disputes.
Date/Time: Fri Jun 18 15:49:09 CDT 2021
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+96B1
This character appears in group 1483 on p.152 of Casey. The field is missing in the database and needs to be added.
Date/Time: Sat Jun 26 09:07:52 CDT 2021
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: PRI #433 (Unicode Version 14.0.0 Beta) feedback
The kIRG_VSource property value for U+20307 𠌇 should be changed from V4-4131 to VN-20307. IRG Working Set 2017 (aka Extension H) serial number 00138 will use V4-4131 as its kIRG_VSource property value, and Vietnam confirmed that it is correct, and that the kIRG_VSource property value for U+20307 𠌇 should be changed to VN-20307. See: https://hc.jsecs.org/irg/ws2017/app/index.php?id=00138
Date/Time: Sun Jun 27 05:59:21 CDT 2021
Name: M
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestion for Unihan Database
You can add "stroke order" information to Unihan Database, e.g. kStroke. Represented by numbers 1-5, 5 basic strokes in Chinese. It is called 笔顺号码 (stroke order numbers) in Chinese. ㇐:㇑:㇒:㇏:㇖ 横:竖:撇:捺:折 héng:shù:piē:nà:zhé 1:2:3:4:5 1: 横 >> ㇐㇀ (from left to right, from bottom left to top right) 2: 竖 >> ㇑㇚ (from top to bottom) 3: 撇 >> ㇒㇓ (from top right to bottom left) 4: 捺 >> ㇏㇝㇔ (from top left to bottom right) 5: 折 >> ㇖㇠㇡㇕㇇㇄㇂ (anything else that folds except ㇚ which is included in the 2nd type of stoke) Let's pick U+7B14 笔 [bǐ] as an example: It's made of ㇒㇐㇔㇒㇐㇔㇒㇐㇐㇟ [撇横捺撇横捺撇横横折] in stroke order. So the kStroke value of this character would be 3143143115. Probably you must know all this. This would help find characters by stroke order quickly. Thanks
Date/Time: Mon Jun 28 18:50:05 CDT 2021
Name: Peter Constable [MSFT]
Report Type: Public Review Issue
Opt Subject: Emoji 14 counts
The counts for new emoji in v14.0 have some inconsistencies: 1) The table at the bottom of the Emoji Recently Added, v14.0β page (https://www.unicode.org/emoji/charts-14.0/emoji-released.html) indicates 37 new emoji characters, which matches what is stated on the BETA Unicode 14.0.0 page (https://www.unicode.org/versions/beta-14.0.0.html), and the draft Unicode 14.0.0 summary page (https://www.unicode.org/versions/Unicode14.0.0/). However, there are 38 rows on the Emoji Recently Added page for non-zwg-sequence emoji. The discrepancy appears to be that row 15 is listing U+1F91D (and five corresponding modifier sequences), which is not new to v14, but rather was added in Emoji v3.0 / Unicode 9.0. 2) The table at the bottom of the Emoji Recently Added page cites 55 (non-zwj) emoji modifier sequences and a total count of new emoji / emoji sequences of 112. However, the emoji-test.txt data file at the https://www.unicode.org/Public/emoji/14.0 cites only 107 items for v14.0, and the emoji-sequences.txt cites only 55 (non-zwj) emoji modifier sequences. This discrepancy may be due to modifier sequences for 1F91D being incorrectly counted as part of v14.0: there are five modifier sequence entries for 1F91D in the emoji-sequences.txt file cited as being from Emoji v3.0.
Date/Time: Mon Jun 28 16:34:26 CDT 2021
Name: Peter Constable [MSFT]
Report Type: Public Review Issue
Opt Subject: Emoji 14 beta
Unicode and UTC do a decent job during the beta for a new Unicode edition of helping reviewers see what new characters are being added to the next version. For example, one can readily browse through the following trail: Open PRIs: https://www.unicode.org/review/ PRI 433, Unicode 14 beta: https://www.unicode.org/review/pri433/ Unicode 14 beta: https://www.unicode.org/versions/beta-14.0.0.html Unicode 14 summary: https://www.unicode.org/versions/Unicode14.0.0/ delta code charts: https://www.unicode.org/charts/PDF/Unicode-14.0/ For emoji additions (atomic characters or RGI sequences), it's much harder to find similar delta information. The Unicode 14 summary page has a link to the Emoji Counts page (https://www.unicode.org/emoji/charts-14.0/emoji-counts.html), but this is cumulative to the latest (beta) version, not a delta. The Unicode 14 summary page also has a link to the Emoji Recently Added, v14.0β page (https://unicode.org/emoji/charts-14.0/emoji-released.html), which appears to be the delta in question, though (at least at first glance) it is unclear how to interpret some of the information. In particular: - The Unicode 14 summary page cites 37 new emoji, and the table at the bottom of the Emoji Recently Added page cites 37 atomic emoji characters, yet the preceding chart has 39 rows. A trained eye might notice that row 16 has a ZWJ sequence, but that still leaves 38 other rows that appear to list atomic characters. It's unclear if the count of "37" is off, or if this page is listing emoji additions beyond 14.0, or some other issue. (Given that rows 15 and 16 both list a short name "handshake", one might guess that is the source of the extra count, except that row 16 is the ZWJ sequence, so already accounted for.) (Some other issues with the Emoji Recently Added page: Unlike delta code charts for The Unicode Standard, this page gives candidate "reference numbers" Xnnnnn, not code points. Also, the sample images in rows 15 and 16 appear to be wrong.) One might happen upon the Emoji List, v14.0 page (https://www.unicode.org/emoji/charts-14.0/emoji-list.html) and search for occurrences of "⊛" to find recent additions, but there are 38 instances, not 37. Since "recently-added" is not exactly the same as "added in v.14", things are still unclear. Or, one might happen upon the Draft Emoji Candidates page (https://www.unicode.org/emoji/future/emoji-candidates.html), which appears to have the same list of emoji as in the Emoji Recently Added page, but doesn't mention v14.0 at all. Turning to other sources, one could go to PUTS #51 (https://www.unicode.org/reports/tr51/tr51-20.html) for find the emoji data and, after correcting for the fact that the ".../latest/..." URL lands at v13.1 data, navigate to the emoji 14.0 data folder (https://www.unicode.org/Public/emoji/14.0/), then look in the data files for E14.0 additions. The emoji-zwj-sequences.txt file has 20 occurrences of "E14.0", which matches the count in the table at the bottom of the Emoji Recently Added page; but the emoji-sequences.txt file has 61 occurrences of "E14.0", which doesn't correspond to counts given elsewhere; of course, that's because some rows have ranges, and if one adds up the range counts for Basic_Emoji, that does add up to 37. But this isn't the easiest way to point reviewers to the new emoji sequences for 14.0, and the 37 vs. 38 discrepancy remains.
Date/Time: Tue Jun 29 21:21:24 CDT 2021
Name: Ryusei Yamaguchi
Report Type: Public Review Issue
Opt Subject: PRI #433 Unicode 14.0.0 Beta
Unihan_IRGSources.txt from https://www.unicode.org/Public/14.0.0/ucd/Unihan-14.0.0d4.zip has some bugs: the kTotalStrokes for following characters don't have exact stroke counts. UCS,char,current,correct U+21FE8,𡿨,3,1 U+248E5,𤣥,5,4 U+2634D,𦍍,6,5 U+264D0,𦓐,6,5 U+26612,𦘒,6,5 U+27607,𧘇,5,4 U+2795B,𧥛,7,6 U+2795C,𧥜,7,6 U+27C27,𧰧,7,6 U+27C28,𧰨,7,6 U+28210,𨈐,7,5 U+28211,𨈑,7,6
Date/Time: Fri Jul 2 03:17:10 CDT 2021
Name: M
Report Type: Other Question, Problem, or Feedback
Opt Subject: Missing Data in Unihan Databse
Found some missing simplification data in Unihan_Variants U+44D6 kTraditionalVariant U+85ED U+4E86 kTraditionalVariant U+77AD U+4F19 kTraditionalVariant U+5925 U+501F kTraditionalVariant U+85C9 U+51AC kTraditionalVariant U+9F15 U+5343 kTraditionalVariant U+97C6 U+535C kTraditionalVariant U+8514 U+5377 kTraditionalVariant U+6372 U+5401 kTraditionalVariant U+7C72 U+5408 kTraditionalVariant U+95A4 U+56DE kTraditionalVariant U+8FF4 U+59DC kTraditionalVariant U+8591 U+5BB6 kTraditionalVariant U+50A2 U+5CC3 kTraditionalVariant U+5DA8 U+5EBC kTraditionalVariant U+5ECE U+624D kTraditionalVariant U+7E94 U+6298 kTraditionalVariant U+647A U+65CB kTraditionalVariant U+93C7 U+6731 kTraditionalVariant U+7843 U+7076 kTraditionalVariant U+7AC8 U+79CB kTraditionalVariant U+97A6 U+8499 kTraditionalVariant U+61DE U+6FDB U+77C7 U+8511 kTraditionalVariant U+884A U+9709 kTraditionalVariant U+9EF4 U+85ED kSimplifiedVariant U+44D6 U+77AD kSimplifiedVariant U+4E86 U+5925 kSimplifiedVariant U+4F19 U+85C9 kSimplifiedVariant U+501F U+9F15 kSimplifiedVariant U+51AC U+97C6 kSimplifiedVariant U+5343 U+8514 kSimplifiedVariant U+535C U+6372 kSimplifiedVariant U+5377 U+7C72 kSimplifiedVariant U+5401 U+95A4 kSimplifiedVariant U+5408 U+8FF4 kSimplifiedVariant U+56DE U+8591 kSimplifiedVariant U+59DC U+50A2 kSimplifiedVariant U+5BB6 U+5DA8 kSimplifiedVariant U+5CC3 U+5ECE kSimplifiedVariant U+5EBC U+7E94 kSimplifiedVariant U+624D U+647A kSimplifiedVariant U+6298 U+93C7 kSimplifiedVariant U+65CB U+7843 kSimplifiedVariant U+6731 U+7AC8 kSimplifiedVariant U+7076 U+97A6 kSimplifiedVariant U+79CB U+61DE kSimplifiedVariant U+8499 U+6FDB kSimplifiedVariant U+8499 U+77C7 kSimplifiedVariant U+8499 U+884A kSimplifiedVariant U+8511 U+9EF4 kSimplifiedVariant U+9709
Date: Mon, 5 Jul 2021 12:19:48 -0400
Name: Daniel Yacob
Subject: 3 Name Defects in Ethiopic Extended-B Tables
I was just working with the table for the Ethiopic Extended-B range, published under the U14 Beta delta listing here: https://www.unicode.org/charts/PDF/Unicode-14.0/ I found that a few names were off, I believe the error originates from the UniBook output that I submitted earlier this year. The defects are: 1E7E9 ETHIOPIC SYLLABLE HWI 1E7EA ETHIOPIC SYLLABLE HWEE 1E7EB ETHIOPIC SYLLABLE HWE In each case the name base "H" should have been "HH", the corrected names: 1E7E9 ETHIOPIC SYLLABLE HHWI 1E7EA ETHIOPIC SYLLABLE HHWEE 1E7EB ETHIOPIC SYLLABLE HHWE I apologize for this. The names are correct in our proposal L2/21-037 (https://www.unicode.org/L2/L2021/21037-gurage-adds.pdf) and I think the difference simply stems from a typographical error that I made while working with UniBook. thank you, -Daniel
Date/Time: Thu Jul 8 20:01:46 CDT 2021
Name: philip r brenan
Report Type: Error Report
Opt Subject: ORNATE LEFT PARENTHESIS should be Ps ?
FD3E;ORNATE LEFT PARENTHESIS;Pe;0;ON;;;;;N;;;;; FD3F;ORNATE RIGHT PARENTHESIS;Ps;0;ON;;;;;N;;;;; Possibly the Pe and Ps are the wrong way around?
Date/Time: Fri Jul 9 18:46:29 CDT 2021
Name: Martin J. Dürst
Report Type: Error Report
Opt Subject: Data files: Emoji Version Mismatch
[This talks about version 13.0, but is very relevant to version 14.0 (now in beta), too.] I'm currently working on updating Ruby from Emoji 13.0 to Emoji 13.1 (see https://bugs.ruby-lang.org/issues/18029). That works for the files in https://www.unicode.org/Public/emoji/13.1/, which all say they are for version 13.1. But it doesn't work for the files moved to https://www.unicode.org/Public/13.0.0/ucd/emoji/, because these files say "# Version: 13.0". Ruby keeps and provides both an Unicode version and an Emoji version (available in Ruby via RbConfig::CONFIG ['UNICODE_VERSION'] and RbConfig::CONFIG['UNICODE_EMOJI_VERSION']). But neither of them matches 13.0. For the files moved under https://www.unicode.org/Public/13.0.0/ucd/emoji/, they really should indicate the Unicode version, not the Emoji version, because they are updated in sync with Unicode versions, and not updated when only Emoji versions get updated.
Date/Time: Mon Jul 12 02:37:11 CDT 2021
Name: Martin J. Dürst
Report Type: Public Review Issue
Opt Subject: Issue 433: Unicode Version 14.0.0 public review: Results from testing on Ruby
This is to report that I have not found any bugs or issues in the Unicode 14.0.0 public beta when temporarily upgrading the programming language Ruby to Unicode 14.0.0. This does not mean that any new characters or properties, or changed property values have been checked for appropriateness. It just means that as far as they are used, the data files and the test data files (e.g. for normalization) provided for the new version 14.0.0 are consistent as far as such consistency is checked when testing the relevant facilities in Ruby. In case you are interested in further details, please feel free to contact me.
Date/Time: Mon Jul 12 17:38:53 CDT 2021
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: UAX44, UTR23 and "string property"
The term "string property" is potentially ambiguous: it might mean a property over the domain of strings, or a property with a co-domain of strings, or both. UAX #44 appears to use "string property" to mean a property with a co-domain of strings. E.g., "String properties are typically mappings from a Unicode code point to another Unicode code point or sequence of Unicode code points..." PU UTR #23 introduces the notion of properties of strings (strings as domain), and avoids the term "string property", using instead "property applied to strings" or "property of strings". In the case of properties with co-domain of strings, it uses clear wording, "string-valued properties". This is helpful and good. PU UTR #23 also calls out the terminology issue that exists in UAX #44: "Note: Properties classed in [UCDDoc] as type "String" are string-valued properties." PU UAX #44, however, does not provide similar clarification and disambiguation. It should, particularly given that Unicode standards closely associated with The Unicode Standard will include properties of strings, and one could argue that UCD itself has properties with a domain of string (e.g., StandardizedVariants.txt as a mapping from an enumerated set of strings to boolean True).
Date/Time: Mon Jul 12 18:53:01 CDT 2021
Contact: dwanders@sonic.net
Name: Debbie Anderson
Report Type: Public Review Issue
Opt Subject: Glyph error U+FD44
I found an error in Arabic Pres Forms-A: the glyphs for FD43 and FD44 are the same. FD44 is incorrect. (See https://www.unicode.org/L2/L2019/19289r-arabic-honorifics.pdf)
Date/Time: Tue Jul 13 16:02:19 CDT 2021
Name: Kent Karlsson
Report Type: Error Report
Opt Subject: BidiMirroring.txt
∉ ∌ # NOT AN ELEMENT OF ∌ ∉ # DOES NOT CONTAIN AS MEMBER These should get the annotation [BEST FIT].
Date/Time: Wed Jul 14 05:08:18 CDT 2021
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: NamesList.txt
Proposed additional comments to NamesList.txt (marked with "proposed new comment" on each proposed addition): 263D FIRST QUARTER MOON = alchemical symbol for silver x (first quarter moon symbol - 1F313) * a crescent, not the first quarter proposed new comment 263E LAST QUARTER MOON = alchemical symbol for silver x (power sleep symbol - 23FE) x (last quarter moon symbol - 1F317) x (crescent moon - 1F319) * a crescent, not the last quarter proposed new comment 1F311 NEW MOON SYMBOL x (black circle - 25CF) 1F312 WAXING CRESCENT MOON SYMBOL * waning crescent moon in the southern hemisphere proposed new comment 1F313 FIRST QUARTER MOON SYMBOL = half moon x (circle with left half black - 25D0) x (first quarter moon - 263D) * last quarter moon in the southern hemisphere proposed new comment 1F314 WAXING GIBBOUS MOON SYMBOL = waxing moon * waning gibbous moon in the southern hemisphere proposed new comment 1F315 FULL MOON SYMBOL x (white circle - 25CB) 1F316 WANING GIBBOUS MOON SYMBOL * waxing gibbous moon in the southern hemisphere proposed new comment 1F317 LAST QUARTER MOON SYMBOL x (circle with right half black - 25D1) x (last quarter moon - 263E) * first quarter moon in the southern hemisphere proposed new comment 1F318 WANING CRESCENT MOON SYMBOL * waxing crescent moon in the southern hemisphere proposed new comment
Date/Time: Wed Jul 14 05:11:02 CDT 2021
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: emoji-variation-sequences.txt
Proposed additions to emoji-variation-sequences.txt. Apparently the emoji style is default, but in calendars it would usually be the text style. Note that FULL MOON SYMBOL is already covered. One might add the crescent and gibbous ones, but they are not common in calendars. 1F311 FE0E ; text style; # (6.0) NEW MOON SYMBOL 1F311 FE0F ; emoji style; # (6.0) NEW MOON SYMBOL 1F313 FE0E ; text style; # (6.0) FIRST QUARTER MOON SYMBOL 1F313 FE0F ; emoji style; # (6.0) FIRST QUARTER MOON SYMBOL 1F317 FE0E ; text style; # (6.0) LAST QUARTER MOON SYMBOL 1F317 FE0F ; emoji style; # (6.0) LAST QUARTER MOON SYMBOL
Date/Time: Wed Jul 14 14:35:53 CDT 2021
Name: Kent Karlsson
Report Type: Public Review Issue
Opt Subject: Emoji handedness
Looking at https://www.unicode.org/emoji/charts-14.0/full-emoji-modifiers.html, there seems to be a preference for right hand (also across vendors). There are some, not so many, "hands" that are apparently left hand. Some are handedness fixed by the name of the emoji, but most are not. Is there any policy regarding handedness? If so which? Or are there any plans for "handedness modifiers"? Some hand gestures, though not so common, are meaningful only with a particular hand. But someone may send a left-hand wave (say), but the receiver may get a right-hand one. Often it might not matter, but sometimes it could matter; if for nothing else, the sender may have a personal handedness preference not only in real life but also for emoji.