Accumulated Feedback on PRI #509

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Fri Dec 06 21:31:48 CST 2024
ReportID: ID20241206213148
Name: Dennis Tan
Report Type: Public Review Issue
Opt Subject: 509


In section 3 (Link Detection Algorithm), subsection "Initiation", the document uses the following 
reference for "top-level domains" https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains. 
Perhaps it would be better suited to use a more authoritative source and one that is updated regularly — 
e.g., https://www.iana.org/domains/root/db. The wiki page doesn't even list the IDN top-level domains.

Date/Time: Wed Dec 18 02:41:23 CST 2024
ReportID: ID20241218024123
Name: Hank Nussbacher
Report Type: Public Review Issue
Opt Subject: 509


In section 7 - https://www.unicode.org/reports/tr58/#test-data - might i suggest that you include 
test data for bidirectional content for Linkification - like from Arabic or Hebrew?

Date/Time: Mon Jan 20 08:04:59 CST 2025
ReportID: ID20250120080459
Name: Arnt Gulbrandsen
Report Type: Public Review Issue
Opt Subject: 509


Hi,

I have compared the UTS58 draft with a few linkifiers.

One omission I saw is that another linkifier tolerates and ignores U+00AD (soft hyphen). The commit 
message is terser than terse, but hints that someone sends text of the form "foo example.com/foo/bar 
bar" with a soft hyphen after the full stop and/or slashes.

It's not clear to me that this is worth bothering with. Your call.

Date/Time: Mon Feb 10 11:17:13 CST 2025
ReportID: ID20250210111713
Name: Jules Bertholet
Report Type: Public Review Issue
Opt Subject: 509


In addition to being used ⸢like⸣ ⸤this⸥,
the half brackets, and possibly also the half parentheses,
can also be used ⸢like⸥ ⸤this⸣
(imitating the East Asian corner brackets).
The Link_Paired_Opener property should be array-valued to reflect this.

Date/Time: Wed Mar 26 05:59:34 CDT 2025
ReportID: ID20250326055934
Name: Ebrahim Byagowi
Report Type: Public Review Issue
Opt Subject: 509


I like to provide a quick drive by comment about https://www.unicode.org/reports/tr58/tr58-2.html

I think the lack of standard recommendation on how URLs should actually be displayed has caused 
https://issuetracker.google.com/issues/40665886 and essentially breaking Persian text in URL 
bars and misrendering of Emoji skin tones using ZWJ in Chrome as described on the tracker.

The same situation exists in Safari but URLs are hidden there for the most part but if one tries 
to edit a URL containing ZWNJ things go wrong by double encoding already encoded ZWNJ in the URL 
like https://phabricator.wikimedia.org/F58924232 (unfortunately this isn't always reproducible 
in Safari but is annoying enough and comes from the same root ZWNJ being displayed by its code)

I'll understand that you may consider these as browsers bugs but after seeing 
https://www.unicode.org/reports/tr58/tr58-2.html and the lengthy discussion I had in Chromium's bug 
tracker which made the developers sure understand what is going on https://issuetracker.google.com/issues/40665886 
I felt if it were some official recommendation things could go more smoothly.

Date/Time: Fri Mar 28 11:53:15 CDT 2025
ReportID: ID20250328115315
Name: cketti
Report Type: Public Review Issue
Opt Subject: 509


Step 4.7. of the termination algorithm currently reads "If LT == Open", but it should be "If LT == Close". (Step 4.6. handles "LT == Open")

Date/Time: Mon Apr 07 06:31:16 CDT 2025
ReportID: ID20250407063116
Name: Arnt Gulbrandsen
Report Type: Public Review Issue
Opt Subject: 509


Hi,

I ran across a bug today that I think points out a relevant problem in UTS58: A user expected 普遍适用测试。我爱你 to be linkified as
<a href="https://普遍适用测试.我爱你">普遍适用测试。我爱你</a> (note the changing dot).

Chrome and some other web browsers map "。" to "." in domains when you hit enter after typing/pasting into the address bar. I do feel 
that at least U+06D4 and U+3002 ought to be mapped to the ASCII dot in UTS58 since it's such a common mistake. ("。" and "." are even 
on the same key on the Chinese keyboards I've seen.)

I mention U+06D4 and U+3002 because I've seen those mistakenly used in "domain names" in the course of my work. U+FF61 and others 
might also be used mistakenly in theory, but I haven't seen that.