Accumulated Feedback on PRI #313

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date: January 15, 2016
Source: Mark Davis

The following question in response to UTS #39 is included in the feedback for UTC consideration.

A question from Gervase, May 2015:

xidmodifications.txt is the UTR#39-recommended way to determine what characters should and should not be allowed for e.g. IDN. One of the big things we are aiming for with IDN is stability and consistency. Are you basically saying that xidmodifications.txt is still a work in progress, and is not suitably stable for this use?

Mark Davis response May 2015:

Because the data is changing over time as we find out more about character usage, xidmodifications.txt does not have stability guarantees (such as "Once a character is Recommended, it will not become Restricted in future versions"). For lookup, the data is aimed more at flagging possibly questionable characters, thus serving as one factor (among perhaps many, like using the "Safe Browsing" service) in determining whether the user should be notified in some way. For registration, flagged characters can result in a "soft no", that is, require the user to appeal a denial with more information.

It would be nice to have a completely certain world, but as you know, the status of URLs can change over time (vide Safe Browsing). For dealing with characters whose status changes to Restricted, implementations can use a grandfathering mechanism if they want to maintain backwards compatibility. The text following http://unicode.org/reports/tr39/proposed.html#Identifier_Status_and_Type also describes how the Recommended set could be customized. (One proposal that we're looking at for 9.0 is to have the Type values in the data file be a set, which provides more information for such customization.)

Certain of the Type values will be backward compatible, at least through Not_XID. See http://unicode.org/reports/tr39/proposed.html#Identifier_Status_and_Type. The Exclusion field is extremely unlikely to change, although in theory a dead language using one of those scripts could be revived. The others are less certain.

Consistency is a trickier topic: it's hard to know what that means. It may well be that one character out of a set of non-spacing marks in a script is Restricted, while others are not. But that can be just a reflection of the fact that that character is obsolete and the others aren't.


Feedback above this line was considered at the February 2016 UTC meeting.

Date/Time: Fri Feb 12 11:29:40 CST 2016
Name: David Patterson
Report Type: Error Report
Opt Subject: Mixed script detection example in TR39

Unicode® Technical Standard #39

UNICODE SECURITY MECHANISMS

Minor correction to the example code in section 5.1.

I believe the code in the method "isMultiScript(String identifier)" is
returning the opposite logical result.

In my tests the all-latin string 'foo' returns true and the mixed string
'foo中文' returns false. Should the method be renamed "isSingleScript" ?

Many thanks for a very useful set of reports. It's been an excellent source of
reference.

Dave Patterson