The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 20, 2020, since the previous cumulative document was issued prior to UTC #163 (April 2020).
The links below go directly to open PRIs and to feedback documents for them, as of July 20, 2020.
Issue Name Feedback Link 421 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback) No feedback at this time 420 Proposed Update UAX #45, U-source Ideographs (feedback) No feedback at this time 419 Proposed Update UAX #44, Unicode Character Database (feedback) No feedback at this time 417 Proposed Update UAX #29, Unicode Text Segmentation (feedback) No feedback at this time 416 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time 415 Proposed Update UTR #23, The Unicode Character Property Model (feedback) No feedback at this time 408 QID Emoji (feedback) Last feedback June 4, 2020
The links below go to locations in this document for feedback.
Feedback routed to Unihan ad hoc for evaluation
Feedback routed to Script ad hoc for evaluation
Feedback routed to UCD and Algorithms ad hoc for evaluation
Feedback routed to Emoji SC for evaluation
Feedback routed to Editorial Committee for evaluation
Other Reports
Date/Time: Fri Jun 12 08:20:02 CDT 2020
Name: Ken Lunde
Report Type: Error Report
Opt Subject: Unihan Database changes
The following are suggested changes to the Unihan Database, which includes justifications for doing so: U+6589 斉, whose current radical is 67 (67.4), is the Japanese simplified form of U+9F4A 齊 whose radical is 210 (210.0). The PRC simplified form of U+9F4A 齊, U+9F50 齐, is also assigned Radical #210 (210'.0), along with Radical 67 (67.2). I propose that 210.0 be added to the existing kRSUnicode property value of U+6589 斉: U+6589 kRSUnicode 67.4 210.0 U+6B6F 歯, whose current radical is 77 (77.8), is the Japanese simplified form of U+9F52 齒 whose radical is 211 (211.0). The PRC simplified form of U+9F52 齒, U+9F7F 齿, is also assigned Radical #211 (211'.0). I propose that 211.0 be added to the existing kRSUnicode property value of U+6B6F 歯: U+6B6F kRSUnicode 77.8 211.0 In addition, U+2B81A 𫠚 (Extension D) uses the Japanese simplified form of U+9F52 齒 as a component, not the PRC simplified form, U+9F7F 齿, so its kRSUnicode value (211'5) should not include a single quote that indicates a PRC simplified form of the radical. I propose that the single quote be removed from the kRSUnicode property value of U+2B81A 𫠚: U+2B81A kRSUnicode 211.5 U+7ADC 竜, whose current radical is 117 (117.5), is the Japanese simplified form of U+9F8D 龍 whose radical is 212 (212.0). The PRC simplified form of U+9F8D 龍, U+9F99 龙, is also assigned Radical #212 (212'.0). I propose that 212.0 be added to the existing kRSUnicode property value of U+7ADC 竜: U+7ADC kRSUnicode 117.5 212.0 In addition, the following ideographs use U+7ADC 竜 as a component, and I propose that Radical #212, along with the appropriate number of residual strokes, be added to their existing kRSUnicode property values (the characters are shown): U+21676 𡙶 kRSUnicode 37.11 212.4 U+23BE1 𣯡 kRSUnicode 82.10 212.4 U+2412F 𤄯 kRSUnicode 85.18 212.11 U+25269 𥉩 kRSUnicode 109.10 212.5 U+25A9D 𥪝 kRSUnicode 117.9 212.4 U+25A9E 𥪞 kRSUnicode 117.9 212.4 U+2A95B 𪥛 kRSUnicode 37.10 212.3 U+2AC6F 𪱯 kRSUnicode 74.17 212.11 U+2ADF9 𪷹 kRSUnicode 85.15 212.8 U+2AF5E 𪽞 kRSUnicode 102.10 212.5 U+2AFC1 𪿁 kRSUnicode 109.14 212.9 U+2B3FD 𫏽 kRSUnicode 159.10 212.7 U+2C099 𬂙 kRSUnicode 74.17 212.11 U+2C514 𬔔 kRSUnicode 116.13 212.8 U+2E13F 𮄿 kRSUnicode 117.25 212.20 U+203A4 𠎤, whose current radical is 9 (9.12), is a variant form of U+9FA0 龠 whose radical is 214 (214.0). I propose that 214.-3 be added to the existing kRSUnicode property value of U+203A4 𠎤: U+203A4 kRSUnicode 9.12 214.-3 U+2B809 𫠉 and U+2B813 𫠓 (Extension D) are variant forms of U+99AC 馬 and U+9CE5 鳥, respectively, which have three fewer strokes. I propose that the residual number of strokes as specified in their kRSUnicode property values be changed from 0 to -3, and that their kTotalStrokes property values be corrected to reflect the actual number of strokes, which is three fewer than their existing kTotalStrokes property values of 10 and 11, respectively: U+2B809 kRSUnicode 187.-3 U+2B809 kTotalStrokes 7 U+2B813 kRSUnicode 196.-3 U+2B813 kTotalStrokes 8 U+2CF04 𬼄 (Extension F), whose current radical is 4 (4.3), is a related to U+2CF01 𬼁 (also Extension F), whose radical is also 4 (4.1), but both ideographs share the same kTotalStrokes property value (2), which is not possible when considering their stroke composition. In addition, U+2CF04 𬼄 is composed of the following three strokes: U+31D1 ㇑, U+31D6 ㇖, and U+31E1 ㇡. This suggests two (2) residual strokes, not three (3). I propose that the kRSUnicode property value of U+2CF04 𬼄 be changed from 4.3 to 4.2 to match the number of actual residual strokes, and that its kTotalStrokes property value be changed from 2 to 3, to match the number of strokes in the radical (1) plus residual strokes (2): 2CF04 kRSUnicode 4.2 2CF04 kTotalStrokes 3 U+2CF09 𬼉 (Extension F), whose current radical is 4 (4.5), is a variant form of U+7F36 缶 whose radical is 121 (121.0), and seems to be missing the first stroke. I propose that 121.-1 be added to the existing kRSUnicode property value of U+2CF09 𬼉: U+2CF09 kRSUnicode 4.5 121.-1 That is all.
Date/Time: Mon Jun 15 20:30:55 CDT 2020
Name: Jim Breen
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: Proposed Unihan Database additions
I would like to propose the following additions to the Unihan Database for U+7A3D (稽) and U+25874 (𥡴). The purpose of the additions is to establish the relationship between them, and to provide Japanese-oriented information for U+25874 which is currently missing from the Database. I have appended some notes relating to the proposed additions. U+7A3D kZVariant U+25874<kMorohashi:T U+7A3D kSemanticVariant U+25874<kMorohashi:T U+25874 kIRGDaiKanwaZiten 25240 U+25874 kMorohashi 25240 U+25874 kNelson 3304 U+25874 kJapaneseKun TODOMERU KANGAERU U+25874 kJapaneseOn KEI U+25874 kZVariant U+7A3D<kMorohashi:TZ U+25874 kSemanticVariant U+7A3D<kMorohashi:TZ When I was first studying Japanese in the 1980s about the only kanji dictionary available to us was the venerable Nelson "Japanese-English Character Dictionary". One of the kanji in Nelson which has been raised with me recently is 稽 (no. 3304), which is now one of the 常用漢字 (common use kanji) taught in Japanese schools. Nelson did not use that glyph for the kanji in his dictionary; he used the closely-related 𥡴 glyph. This presented a slight problem when we began to develop electronic versions in the early 1990s, as 𥡴 was not in the main JIS standard (JIS X 0208-1983/1990) [See Note 1 below]. The solution was to use 稽 instead; after all, it is the "correct" kanji. (Morohashi's 大漢和辞典 has a full entry for 稽 (no. 25218) and an abbreviated entry for 𥡴 (no. 25240) pointing out it is a variant of 稽.) When the New Nelson was published in 1997 the editor, John Haig, kept the 𥡴 as the "correct" glyph (no. 4174) and included 稽 (no. 4163) as "nonstandard for 𥡴". Spahn and Hadamitzky in their 1996 "The Kanji Dictionary" similarly base their entry on the 𥡴 glyph (index 5d11.3) and list 稽 as an alternative. The 𥡴 form is not currently in any JIS kanji standard, but it is in Unicode (U+25874). The Unihan data indicates it has been based on Taiwanese sources. There is currently no reference to Morohashi or any other Japanese source, and no mention of its association with 稽 (U+7A3D), the usual Japanese readings (ケイ, かんがえる, とどめる) or the meanings usually associated with it in Japan. Note 1. The predecessor to JIS X 0208, JIS C 6226, which was published in 1978, had the 16-stroke 𥡴 glyph in the code-point now occupied by 稽. This was changed when it was replaced by JIS X 0208-1983.
Date/Time: Thu Apr 23 13:30:44 CDT 2020
Name: Markus W Scherer
Report Type: Error Report
Opt Subject: uppercase of U+0587 ARMENIAN SMALL LIGATURE ECH YIWN
Maybe for the Script Ad Hoc? We have received a bug report claiming that the uppercase form of U+0587 և is wrong. SpecialCasing.txt has # <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment> 0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN This means that the ligature small ech-yiwn uppercases to ԵՒ=capital ech+yiwn=0535+0552. The report says that it should uppercase to ԵՎ=capital ech+vew=0535+054E. I have asked for an authoritative reference and will report when I receive something. In the meantime, I found this: https://en.wikipedia.org/wiki/Armenian_alphabet#endnote_h “The ligature և has no majuscule form; when capitalized it is written as two letters Եւ (classical) or Եվ (reformed).” Can someone confirm this? If true, should we change SpecialCasing.txt to use the "reformed" uppercasing? Should implementers (e.g., ICU) offer both versions? Under what conditions? Please advise.
Date/Time: Tue Jun 16 04:38:17 CDT 2020
Name: Sandra Lippert
Report Type: Feedback on an Encoding Proposal
Opt Subject: capital H with line below etc.
Dear Sirs and Madams, I hope I chose the correct category for this - I did not find "proposing an encoding". I am an Egyptologist, and while I am very glad that in the last years, almost all of the special glyphs we need for translitterating ancient Egyptian have been added to Unicode, I am very much puzzled why there is still no capital H with a line below in Unicode, even though the corresponding lowercase letter (U+1E96) exists. This was clearly an oversight, but why did it not get fixed since? It cannot be that no-one ever pointed it out: in my search for answers, I came upon a discussion thread from 18 years ago ( https://unicode.unicode.narkive.com/8rfiWRgg/capital-letter-h-with-line-below ) where this problem was already mentioned, but nothing seems to have been done about it since. There, it was suggested that one combine U+0048 and U+0331, but this works only in a very limited number of fonts because the combining macron below is sometimes too large or too narrow for capital H and is often shifted to one side instead of being centered coreectly. And while we are at it: the glyphs for capital and lower case h with ^ underneath (necessary for translitterating demotic texts) are also absent from Unicode, and again, adding a combining circumflex below (U+032D) does not work in a lot of fonts because it is not centered correctly. Sometimes, it works in the regular font but "slips off" to one side as soon as one switches to italics, which is standard for egyptological translitteration. This is not a very fancy letter either, and its "cousin", Ṱ/ṱ (U+1E70 / U+1E71), also used in translitterating demotic, is already present, so it would be very helpful if it was finally encoded as well. Thank you in advance for considering my request. I am looking forward to hearing from you, kind regards, Sandra Lippert Directrice de recherche CNRS, Paris (UMR 8546-AOrOc)
Date/Time: Thu Jul 2 15:46:27 CDT 2020
Name: Kent Karlsson
Report Type: Error Report
Opt Subject: KHMER CONSONANT SIGN COENG DA should look like KHMER LETTER DA, not like KHMER LETTER TA
Regarding: http://www.unicode.org/versions/Unicode13.0.0/ch16.pdf Table 16-8. Khmer Subscript Consonant Signs This table gives for 17D2 178A khmer consonant sign coeng da a glyph that is identical to that of 17D2 178F khmer consonant sign coeng ta Actually, COENG DA did have, and should still have, a (range of) glyph derived from the (range of) glyph for KHMER CONSONANT DA. The current "recommendation" (if that is what that table is) leads to that neither the author nor the reader of a text knows which of the two (COENG DA or COENG TA) is used in a text, as both looks like COENG TA. Further, one cannot represent (with that "recommendation") texts that really do have a COENG DA that looks similar to a DA. COENG DA really did have its own glyph based on the glyph for DA. Having a separate (preferably DA-shape based) glyph for COENG DA will both make it possible for authors and readers to see (without checking the character code) whether a COENG DA or a COENG TA is used, and also makes historical as well as modern spelling using COENG DA possible. (Introducing a "KHMER ARCHAIC COENG DA" or similar, which has been floated as a possibility, is not a good idea. It does not solve the first problem, and would be a strange and unnecessary "solution" to the second problem.) I got two references from Richard Wordingham, both showing a "DA-shaped" COENG DA: * http://aefek.free.fr/iso_album/antelme_bis.pdf (pp25 and 26) * http://www.khmerfonts.info/fontinfo.php?font=1507 So the use of a "COENG TA"-glyph where one used to use "COENG DA" should be seen as a spell change, not a "glyph merger" or whatever. Changing (correcting) fonts to use a "DA"-like glyph for "COENG DA" may reveal some (in modern view) spell errors, but that is as it should be. Conclusion: in table 16-8, change the glyph in the line for 17D2 178A khmer consonant sign coeng da to a subscript glyph based on the glyph for KHMER LETTER DA.
Date/Time: Fri Apr 24 17:59:22 CDT 2020
Contact: fantasai@inkedblade.net
Name: Elika J. Etemad
Report Type: Error Report
Opt Subject: UTR50 orientation of Bopomofo tone marks
Hello UTC, I'm writing regarding the four tone marks used in bopomofo: 02C9 MODIFIER LETTER MACRON 02CA MODIFIER LETTER ACUTE ACCENT 02C7 CARON 02CB MODIFIER LETTER GRAVE ACCENT 02D9 DOT ABOVE These are currently registered as R in UTR50, but they should probably be adjusted to U, consistent with the rest of the Bopomofo letters. (They're a bit more widely used than just within Bopomofo, but UTR50 is primarily targetted at CJK context, and within this context these modifier letters are much more likely to be used as Bopomofo tone marks than otherwise.) See discussion thread at https://lists.w3.org/Archives/Public/www-style/2015Aug/0315.html for more context. Thanks~ ~fantasai
Date/Time: Tue May 12 20:46:39 CDT 2020
Name: Manish Goregaokar
Report Type: Error Report
Opt Subject: IdentifierType of Ainu Katakana characters
In IdentifierStatus.txt: 31F0..31FF ; Technical # 3.2 [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO These are from the Katakana Phonetic Extensions block; which exists for writing the Ainu language. Ainu is apparently both written using the Latin and Katakana scripts, using these extensions. According to UTS 39 Table 1[1], "Technical" is "Specialized usage: technical, liturgical, etc.", which doesn't seem to fit with code points that are actively used in a primary script for a language. Should we be changing this to Recommended? [1]: https://www.unicode.org/reports/tr39/#Identifier_Status_and_Type
Date/Time: Wed May 20 01:31:04 CDT 2020
Name: Trevor
Report Type: Error Report
Opt Subject: IDNA test case error
Hello, I believe I have found 2 tests in https://www.unicode.org/Public/idna/13.0.0/IdnaTestV2.txt whose expected result are not possible to represent when using the ToASCII operation with Transitional_Processing = true, CheckJoiners = false, and VerifyDnsLength = false. This relates to tests whose source string is U+200C or U+200D. The U+200C and U+200D get mapped to an empty string due to the use of Transitional Processing and as a result, the expected ouput is an empty string. However, it is not possible to represent an empty string as the expected output for toAsciiT because an empty string means that toAsciiT "adopts" toAsciiN's value, which in this case is either 'xn--1ug' or 'xn--0ug'. Tests in question (source string escaped for readability): \u200D; ; [C2]; xn--1ug; ; ; [A4_2] # \u200C; ; [C1]; xn--0ug; ; ; [A4_2] #
Date/Time: Tue Jun 2 06:23:29 CDT 2020
Name: Bahman Eslami
Report Type: Error Report
Opt Subject: ARABIC DATE SEPARATOR class error
Hello, The error is that the charachter ARABIC DATE SEPARATOR is classified as Bidi Category "AL" which would imply strong right-to-left direction. This makes it's impossible to apply kerning between Arabic script numbers and ADS. Please take a look at the following issue on github: https://github.com/googlefonts/ufo2ft/issues/384 I think bi-directional type of the ADS should be LTR or Neutral. Thanks, Bahman
Date/Time: Sat May 30 19:34:01 CDT 2020
Name: Elika J. Etemad
Report Type: Error Report
Opt Subject: Vertical Text in UAX9 Mostly Irrelevant
The rules in UAX9 6.2 Vertical Text http://unicode.org/reports/tr9/#Vertical_Text are presented as if this is what implementations are expected to do, but actually, most of them don't. RTL text is rendered bottom-to-top instead. The section should be removed, or rewritten to be an example of something that *could* be done with UAX9's algorithms (but isn't necessarily).
Date/Time: Fri May 29 16:49:03 CDT 2020
Name: Trevor
Report Type: Error Report
Opt Subject: UTS#46 tests and URL delemiters
Hello, There are a number of tests[1] that contain labels that have a U+003F "?" question mark code point where the test expects the label containing the U+003F "?" question mark to remain in its Unicode form when performing the toASCII[2] operation on the domain. As far as I can tell, there is nothing in the UTS#46 specification that prevents the label from being converted into an ASCII label. The toASCII[2] operation converts all labels to ASCII unless punycode returns an error. Going through the Punycode spec, Punycode's encode[3] algorithm does not reject U+003F "?" question marks and as a result labels containing U+003F "?" question marks get converted to ASCII contrary to the test expectations. I presume that that tests are trying to say that any label containing common URL delimiters such as ":/@.?#[]" shouldn't be converted to an ASCII label, but I'm not really sure what the expected results are supposed to be. I suppose you could add a check for such common URL delimiters and skip punycode encoding labels that contain one assuming the test expectations are correct. [1] https://www.unicode.org/Public/idna/13.0.0/IdnaTestV2.txt [2] https://www.unicode.org/reports/tr46/#ToASCII [3] https://tools.ietf.org/html/rfc3492#section-6.3 - Trevor
Date/Time: Mon Jun 22 16:19:50 CDT 2020
Name: fantasai
Report Type: Error Report
Opt Subject: UAX14 quotation marks vs ID
The UAX14 rules concerning QU are too strict, and don't work for Chinese by default, because they rely on spaces to be a reasonable default. This can probably be solved by allowing breaks between ID + Pi and between Pf + ID. See https://github.com/w3c/clreq/issues/245 for more info.
Date/Time: Fri Jun 26 19:06:23 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing Indic shaping properties for U+0300 and U+0301
The Unicode Standard 13.0, page 466, recommends the characters U+0300 combining grave accent and U+0301 combining acute accent for use with the Devanagari script. However, these characters do not have Indic syllabic categories defined for them, so it’s not clear how they would be used and where they would fit into Devanagari syllables.
Date/Time: Tue Jul 7 17:43:56 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing Indic shaping properties for Devanagari and Vedic characters
A number of Devanagari and Vedic characters are missing Indic syllabic or positional category definitions in the Unicode 13.0 data: – 0950, 0971, A8F4..A8F7, A8FB, A8FD don’t have an Indic syllabic category (as letters they don’t need a positional category). – 1CE2..1CE8, 1CED don’t have syllabic categories (they do have positional categories). – 1CF8..1CF9 don’t have a positional category (they do have a syllabic category). For the first set, I can imagine that some of the characters don’t participate in forming Devanagari syllables, and therefore the default value Other for the syllabic category is actually correct. If that’s the case, however, I think it would be preferable to explicitly provide the value, both to make clear that it’s intentional and to remind users of the data that this value can occur with Brahmic scripts (the specification of the Universal Shaping Engine currently does not handle this case).
Date/Time: Wed May 20 13:49:29 CDT 2020
Name: Yitz Gale
Report Type: Error Report
Opt Subject: Emoji - multiple skin tones for handshake
In the current EMOJI standard, version 13.0, section 2.6 Multi-Person Groupings explicitly mentions U+1F91D HANDSHAKE as an emoji that depicts more than one person interacting and could be implemented with a choice of skin tones. However, in section 2.6.2 Multi-Person Skin Tones, there is no mention of how to specify two different skin tones for U+1F91D HANDSHAKE. It is not clear at all how to do that. As a result, vendors have not implemented this in their Emoji sets. In my opinion, this particular combination - multiple skin tones in a handshake - is especially important to be included, because it would enable people to express naturally, in the course of conversations, feelings of inclusiveness and peace among diverse groups. Below are a few suggestion of how we might specify multiple skin tones in a handshake. I don't find any of them particularly satisfying. You might pick one of these, or perhaps do something else. But please, do standardize a way to represent this, mention it explicitly in the standard, and encourage vendors to include it in their Emoji sets. Thanks! 1F91D 1F3FB 200D 1F91D 1F3FD HANDSHAKE, LIGHT SKIN TONE, ZWJ, HANDSHAKE, MEDIUM SKIN TONE 270B 1F3FB 200D 1F91D 200D 270B 1F3FD HAND, LIGHT SKIN TONE, ZWJ, HANDSHAKE, ZWJ, HAND, MEDIUM SKIN TONE 270B 1F3FB 200D 270B 1F3FD HAND, LIGHT SKIN TONE, ZWJ, HAND, MEDIUM SKIN TONE
Date/Time: Wed May 20 14:18:10 CDT 2020
Name: Brian Hendery
Report Type: Other Question, Problem, or Feedback
Opt Subject: Handshake emoji
Hey there, In section 2.6 Multi-Person Groupings, it mentions that HANDSHAKE depicts multiple persons and so should allow multiple skin tones. But in section 2.6.2 Multi-Person Skin Tones, there are no instructions how to do that for HANDSHAKE. Hoping you can take a look at this! Cheers, Brian
Date/Time: Tue May 5 07:45:51 CDT 2020
Name: David Corbett
Report Type: Error Report
Opt Subject: Leading zeros in code point labels
Section 4.8 of TUS says “code point labels are constructed by using a lowercase prefix derived from the code point type, followed by a hyphen-minus and then a 4- to 6-digit hexadecimal representation of the code point.” The convention is obviously to use as few leading zeros as possible, but is that required by definition? For example, could control-0009 be referred to as control-000009? It is important to clarify this because code point labels are part of the character name namespace.
Date/Time: Fri May 29 20:42:13 CDT 2020
Name: Yoshidumi
Report Type: Error Report
Opt Subject: Simple Typo in UTR-25 Document
Hello. I found a simple typo in UTR-25 document <https://www.unicode.org/reports/tr25/>. In MathML example on page 36 of it, <mover> element’s right tag is written as “)”, but it should be “>”. Sorry for bothering you by detailed points...
Date/Time: Thu Jun 25 22:04:45 CDT 2020
Name: Norbert Lindenberg
Report Type: Error Report
Opt Subject: Missing definitions for nukta, bindu, svara
Section 12.1 Devanagari of the Unicode Standard 13.0, page 460, refers to the character types nukta, bindus, and svaras. Neither of these terms is defined on this page, any previous page in the section, the pages referenced for them in the Standard’s index, or in the Unicode Glossary. “Nukta” here presumably means the one Devanagari character whose name includes “nukta”. “Bindu” is later, on page 466, explained as “One class of these marks, known as bindus, is represented by U+0901 devanagari sign candrabindu and U+0902 devanagari sign anusvara.” That seems to be an incomplete definition, as the Unicode data file IndicSyllabicCategory.txt identifies three additional Devanagari bindu characters. “Svara” is mentioned in the Unicode Glossary as a synonym for “vowel”, and in IndicSyllabicCategory.txt in the context of cantillation marks. The mention in the glossary doesn’t fit the usage on page 460, and it’s not clear whether the cantillation marks are meant, and whether there are other svara characters.
Date/Time: Wed Jul 8 13:55:33 CDT 2020
Name: Dirkjan Ochtman
Report Type: Error Report
Opt Subject: UTS #46 (rev 25): incorrect TLD in example
Hi there, In UTS #46, rev 25, section 4.5, the third row appears as if "xn--blo-7ka.de" should be converted to "bloß.com". I guess the latter value should read as "bloß.de" instead (or the encoded value should be changed). Kind regards, Dirkjan Ochtman
Date/Time: Wed Jul 8 17:29:48 CDT 2020
Name: Stanislaw Goldstein
Report Type: Error Report
Opt Subject: Sections 2.8 and 2.9 not upgraded
The fact that a few thousand characters were added to plane 3 (CJK Extension G) was overlooked in sections on Unicode allocation (sections 2.8 and 2.9): 1) Tertiary Ideographic Plane should be mentioned on page 44 of the Standard, just before the paragraph on Supplementary Special-purpose Plane. 2) Figure 2.13 should be changed to show a part of plane 3 in dark grey. 3) Plane 3 should be added on page 51, below Plane 2; otherwise, the statement "All other planes are reserved; there are no characters assigned in them." at the beginning of the last paragraph on page 51 is wrong. I have not checked the rest of the Standard on the information regarding TIP, there may be other errors of this type there.
Date/Time: Fri Jul 10 11:01:15 CDT 2020
Name: Paul Hardy
Report Type: Error Report
Opt Subject: 0x28 and 0x29 Flipped in APL-ISO-IR-68.TXT
Note: This report has been fully resolved by the Editorial Committee, and an updated data file has been posted.
Greetings, I suspect that the ISO-IR-68 code points 0x28 and 0x29 have names that are flipped in the file https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/APL-ISO-IR-68.TXT. That file names 0x28 as LOGICAL AND and 0x29 as LOGICAL OR. The ISO-IR-68 standard file (see https://www.itscj.ipsj.or.jp/iso-ir/068.pdf) names 0x28 as DOWN CARET and 0x29 as UP CARET, which correspond to logical or and logical and, respectively. As correlation, see the alternate APL character set for the DECwriter II and DECwriter LA120 described below. The DECwriter LA120 (http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/terminal/la120/EK-LA120-UG-003_LA120_Users_Guide_Jun79.pdf, p. 83) shows the DOWN CARET at octal code point 050 (which is hexadecimal 0x28), and the UP CARET at octal code point 051 (which is hexadecimal 0x29). Likewise, the DECwriter II manual shows the same ordering (see http://www.bitsavers.org/www.computer.museum.uq.edu.au/pdf/EK-LA3635-OP-002%20LA35%20&%2036%20DECwriter%20II%20User's%20Manual.pdf, p. 1-16, Table b). Further correlation appears in Kenneth Iverson's _A Programming Language_, John Wiley & Sons, New York: 1962, Library of Congress Catalog Card Number 62-15180. Section 1.4 (p. 11) describes the DOWN CARET as "or" and the UP CARET as "and". This interpretation is repeated in the "Summary of Notation" appendix in section S.4 "Elementary Operations", p. 267. I can send pictures of those pages if you would like. This is still further correlated by Unicode code points U+2227, LOGICAL AND (showing a UP CARET-type glyph), U+2228, LOGICAL OR (showing a DOWN CARET-type glyph). The error from APL-ISO-IR-68.TXT appears to have also propagated to Wikipedia; see the table in the "Character set" section at https://en.wikipedia.org/wiki/ISO-IR-68. Thank you, Paul Hardy
(None at this time.)