This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Sun Aug 9 01:10:36 CDT 2020
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: UNIHAN proposed update feedback
1) On the description of many fields, many book names are not italicized: or instance, on the kCihait field description, the name of the book "Cihai" is not italicized. Field | Book name not italicized ------------------------------------------ kCihait | Cihai kDaeJaweon | Dae Jaweon kGSR | Grammata Serica Recensa kHanYu | Hanyu Da Zidian kIRGDaeJaweon | Dae Jaweon kIRGDaiKanwaZiten | Dai Kanwa Ziten kIRGHanyuDaZidian | Hanyu Da Zidian kMorohashi | Dai Kanwa Ziten kNelson | The Modern Reader’s Japanese-English Character Dictionary kSBGY | Song Ben Guang Yun (the exact spelling of the name must be reviewed) kTGHZ2013 | Tōngyòng Guīfàn Hànzì Zìdiǎn kXHC1983 | Xiàndài Hànyǔ Cídiǎn 2) Somewhere, it must be stated, that the fields that start with "GB", correspond to the "Guobiao standards" of Mainland China (preferably at the corresponding field descriptions). 3) The kGB7 field description, is not clear that its source is made up of two list rather than one. Some minor edits should clear it up: Old: The "General Purpose Hanzi List for Modern Chinese Language, and General List of Simplified Hanzi" mapping for this character in ku/ten form. New: The "General Purpose Hanzi List for Modern Chinese Language," and the "General List of Simplified Hanzi" mapping for this character in ku/ten form. 4) The kTang field contains an anomaly in the description: ".... An asterisk indicates that the word or morpheme represented in toto or in part by the given character with the given reading occurs more than four times ..." The word "toto" seems to be a mistake for the word "full".
Feedback above this line was reviewed during UTC #165.
Date/Time: Tue May 25 20:07:51 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: A few improvements to the field descriptcions of UAX#38
1. Expand the description of kCCCII: Extra information is needed, like the meaning of the initials, the creators and the age. I propose for it to read: Description | The mapping for this character in hexadecimal, in the "Chinese Character Code for Information Interchange" (CCCII). Created by the "Chinese Character Analysis Group" (CCAG), with its latest version coming out in 1987. Earlier versions of CCCII served as the base for the EACC (see kEACC) so many entries are identical between fields. The terms between quotes indicate the use of italics. My source is the Wikipedia article (https://en.wikipedia.org/wiki/Chinese_Character_Code_for_Information_Interchange) which in turn cites the book 'CJKV Information Processing'. I lack access to that book so I ignore the primary source, but if it can be found, it would be important to cite. The code scheme may have its origin in Taiwan (ROC). Finally, in the field description of kEACC, the complete name (East Asian Character Code for Bibliographic Use) should be spelled with italics, so it is clear that is what the initials stand for. 2. Specify that the romanization used by kJapaneseKun is 'Hepburn'. I ignore if the same applies to kJapaneseOn. 3. Make all appearances of the word 'pinyin' be consistently and correctly spelled as 'pīnyīn' (except, of course, the names of the fields). 4. Include the number of entries for each field: This would add a new row between 'Syntax' and 'Description'. The purpose would be to estimate the size of the field, as well the relative coverage of Ideographs it has.
Date/Time: Tue May 25 21:13:11 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Lack of documentation relating to other legacy East Asian encodings
Using this article as my source (https://en.wikipedia.org/wiki/Extended_Unix_Code) I list some encoding schemes and character sets, not mentioned in UAX#38. This is merely a fyi type of observation, so it should not affect the text of the annex for now. Furthermore, I'm not sure if all elements of the list necessarily contain ideographs or can be considered identical to other fields. In the first case they can be dismissed, but in the latter, the equivalence must be clearly documented. If there is another document that clarifies the relation between different standards, it should be cited in the doc. The list in question is: EUC-JP (Extended Unix Japan), Shift JIS, DEC Kanji (by Digital Equipment Corporation), HP-15, HP-16, IKIS (by Data General), MacJapanese (MacOS), Windows-932 (IBM-943), KEIS (by Hitachi), EUC-KR (Extended Unix Korean/Wansung), UHC (Unified Hangul Code/Windows949/Extended Wansung), HangulTalk (MacOS) and EUC-TW (Extended Unix Taiwan) Big5 is only mentioned but not properly explained, and the GB fields are not correctly attributed to Gubiao Standards.
Date/Time: Fri May 28 09:29:04 CDT 2021
Name: Michel Mariani
Report Type: Error Report
Opt Subject: Corrections for kTotalStrokes
After reporting issues on the Unihan mailing list, I am submitting the following corrections for the kTotalStrokes property: U+28668 𨙨 kTotalStrokes 7 U+2F9DD 𠣞 kTotalStrokes 9 > It does not surprise me that there are some puzzlers lurking in the > kTotalStrokes property. The correction for U+28E0F 𨸏 was submitted > as public feedback by Jaemin Chung on 2020-09-03, and the correction > was applied to the Unihan database. > If you don't mind, please submit the following corrections via the > Contact Form so that we have a paper trail: > [...] > We should be able to get those corrections applied in time for > Unicode Version 14.0.
Date/Time: Fri Jun 11 23:46:52 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Anomalies in the spelling and format of Unihan field descriptions
I have noticed some anomalies in the spelling or format in different field descriptions, particularly "kIRG_GSource". kIRG_GSource: Misspelled Book Name | Corrected book name ----------------------------------------- ZhongHua ZiHai | Zhonghua Zihai Chinese Encyclopedia | Encyclopedia of China (name also needs to be italicized) Ci Hai | Cihai (if the "Ci Hai" spelling is preferred, then it should be used consistently everywhere) Ci Yuan | Ciyuan Hanyu Dacidian | Hanyu Da Cidian Hanyu Dazidian | Hanyu Da Zidian Hanyu Fangyan Dacidian | Hanyu Fangyan Da Cidian Chinese Ancient Ethnic Research on Ancient (unless it doesn't refer to the book of the Characters Research | Chinese Characters same name, it should be italicized) Chinese book titles without pinyin or translations (if added, they should be italicized): Chinese title | pinyin [translation] --------------------------------------------- 汉语大字典(第二版) | Hanyu Da Zidian (second edition) 漢文佛典疑難俗字彙釋與研究 | Hànwén Fódiǎn Yínán Sú Zìhuì Shì Yǔ Yánjiū [Explanation and Research on Difficult and Vulgar Words in Chinese Buddhist Classics] 龍龕手鑑 | Longkan Shoujian [The Handy Mirror in the Dragon Shrine or The Dragon Shrine/Niche Handbook] Names of books not italicized: Siku Quanshu, Yinzhou Jinwen Jicheng Yinde, Standard Telegraph Codebook (revised) ((the last one should also precede the Chinese name for consistency)) Also, the name of the publisher "Zhuang Liao Songs Research" appears without spaces between words kHDZRadBreak: In the sentence "Indicates that 《漢語大字典》 Hanyu Da Zidian has a radical break beginning at this character’s position." place the pinyin first, and italicize it to be consistent. Similar suggestions apply to the descriptions of kHanyuPinlu and kHanyuPinyin kIRG_KPSource: Reword the sentence: "... There may, therefore, be erroneous data in the values for this field." to say: "... Therefore, there may be erroneous data in the values for this field." kIRG_SSource: Italicize or place quotes on the name "Taishō Shinshū Daizōkyō" kIRG_VSource: Italicize or place quotes on the name "Kho Chữ Hán Nôm Mã Hoá"
Date/Time: Thu Jul 8 11:13:37 CDT 2021
Name: Ken Lunde
Report Type: Public Review Issue
Opt Subject: PRI #421 Feedback
There are a small number of anomalies in earlier versions of the Unihan database, and it may be useful to document them in UAX #38, mainly in Section 5, "History": 1) The Version 2.0.0 Unihan database file, Unihan-1.txt, is truncated in the middle of the records for U+8BC1 证: https://www.unicode.org/Public/2.0-Update/Unihan-1.txt While this is already documented in Section 5 of UAX #38, it may be helpful to add that the CD that is included with the Unicode Version 2.0 book has the same issue, specifically that the files at {DOS,MAC,UNIX}/MAPPINGS/EASTASIA/UNIHAN.TXT are truncated at the same position. This would preclude those who have the Unicode Version 2.0 book from checking the CD on their own (like I did . 2) The Version 3.0.0 Unihan database file, Unihan-3.txt, includes 3,898 records for the undocumented kJHJ property: https://www.unicode.org/Public/3.0-Update/Unihan-3.txt I suggest that appropriate entries be added to the table in Section 4.2 of UAX #38, specifically the following: Version 3.0 row: Add "kJHJ" to the "Fields Added" column Version 3.1 row: Add "kJHJ" to the "Fields Dropped" column It may also be useful to document this property in Section 5 for the benefit of those who parse older versions of the Unihan database. 3) The Version 3.1.1 Unihan database file, Unihan-3.1.1.txt, includes the following anomalous record at line 246,442: U+64AC 297 See: https://www.unicode.org/Public/3.1-Update1/Unihan-3.1.1.txt It may be useful to document this in Section 5 for the benefit of those who parse older versions of the Unihan database. 4) The Versions 2.0.0, 2.1.2, 3.0.0, and 3.1.0 Unihan database files are not encoded in UTF-8: https://www.unicode.org/Public/2.0-Update/Unihan-1.txt https://www.unicode.org/Public/2.1-Update/Unihan-2.txt https://www.unicode.org/Public/3.0-Update/Unihan-3.txt https://www.unicode.org/Public/3.1-Update/Unihan-3.1.txt It may be useful to document this in Section 5 for the benefit of those who parse older versions of the Unihan database. That is all.
Date/Time: Thu Jul 15 13:24:42 CDT 2021
Name: Ben Scarborough
Report Type: Public Review Issue
Opt Subject: Proposed change for UAX #38
In the current proposed update for UAX #38, the syntax for the kIRG_VSource property is: V[0-4N]-[023F]?[0-9A-F]{4} To keep it in line with how regexes are laid out for other IRG source properties, a more accurate regex would be: V[0-4]-[0-9A-F]{4} | VN-[023F][0-9A-F]{4} because the V[0-4] sources are always 4 hex digits and the VN sources are always 5.