This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Fri Dec 06 21:31:48 CST 2024
ReportID: ID20241206213148
Name: Dennis Tan
Report Type: Public Review Issue
Opt Subject: 509
In section 3 (Link Detection Algorithm), subsection "Initiation", the document uses the following reference for "top-level domains" https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains. Perhaps it would be better suited to use a more authoritative source and one that is updated regularly — e.g., https://www.iana.org/domains/root/db. The wiki page doesn't even list the IDN top-level domains.
Date/Time: Wed Dec 18 02:41:23 CST 2024
ReportID: ID20241218024123
Name: Hank Nussbacher
Report Type: Public Review Issue
Opt Subject: 509
In section 7 - https://www.unicode.org/reports/tr58/#test-data - might i suggest that you include test data for bidirectional content for Linkification - like from Arabic or Hebrew?
Date/Time: Mon Jan 20 08:04:59 CST 2025
ReportID: ID20250120080459
Name: Arnt Gulbrandsen
Report Type: Public Review Issue
Opt Subject: 509
Hi, I have compared the UTS58 draft with a few linkifiers. One omission I saw is that another linkifier tolerates and ignores U+00AD (soft hyphen). The commit message is terser than terse, but hints that someone sends text of the form "foo example.com/foo/bar bar" with a soft hyphen after the full stop and/or slashes. It's not clear to me that this is worth bothering with. Your call.
Date/Time: Mon Feb 10 11:17:13 CST 2025
ReportID: ID20250210111713
Name: Jules Bertholet
Report Type: Public Review Issue
Opt Subject: 509
In addition to being used ⸢like⸣ ⸤this⸥, the half brackets, and possibly also the half parentheses, can also be used ⸢like⸥ ⸤this⸣ (imitating the East Asian corner brackets). The Link_Paired_Opener property should be array-valued to reflect this.
Date/Time: Wed Mar 26 05:59:34 CDT 2025
ReportID: ID20250326055934
Name: Ebrahim Byagowi
Report Type: Public Review Issue
Opt Subject: 509
I like to provide a quick drive by comment about https://www.unicode.org/reports/tr58/tr58-2.html I think the lack of standard recommendation on how URLs should actually be displayed has caused https://issuetracker.google.com/issues/40665886 and essentially breaking Persian text in URL bars and misrendering of Emoji skin tones using ZWJ in Chrome as described on the tracker. The same situation exists in Safari but URLs are hidden there for the most part but if one tries to edit a URL containing ZWNJ things go wrong by double encoding already encoded ZWNJ in the URL like https://phabricator.wikimedia.org/F58924232 (unfortunately this isn't always reproducible in Safari but is annoying enough and comes from the same root ZWNJ being displayed by its code) I'll understand that you may consider these as browsers bugs but after seeing https://www.unicode.org/reports/tr58/tr58-2.html and the lengthy discussion I had in Chromium's bug tracker which made the developers sure understand what is going on https://issuetracker.google.com/issues/40665886 I felt if it were some official recommendation things could go more smoothly.
Date/Time: Fri Mar 28 11:53:15 CDT 2025
ReportID: ID20250328115315
Name: cketti
Report Type: Public Review Issue
Opt Subject: 509
Step 4.7. of the termination algorithm currently reads "If LT == Open", but it should be "If LT == Close". (Step 4.6. handles "LT == Open")
Date/Time: Mon Apr 07 06:31:16 CDT 2025
ReportID: ID20250407063116
Name: Arnt Gulbrandsen
Report Type: Public Review Issue
Opt Subject: 509
Hi, I ran across a bug today that I think points out a relevant problem in UTS58: A user expected 普遍适用测试。我爱你 to be linkified as <a href="https://普遍适用测试.我爱你">普遍适用测试。我爱你</a> (note the changing dot). Chrome and some other web browsers map "。" to "." in domains when you hit enter after typing/pasting into the address bar. I do feel that at least U+06D4 and U+3002 ought to be mapped to the ASCII dot in UTS58 since it's such a common mistake. ("。" and "." are even on the same key on the Chinese keyboards I've seen.) I mention U+06D4 and U+3002 because I've seen those mistakenly used in "domain names" in the course of my work. U+FF61 and others might also be used mistakenly in theory, but I haven't seen that.