This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Fri Aug 19 07:09:31 CDT 2016
Name: Jonathan Warden
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestions for Immutable Identifiers
I have a few suggestions regarding UAX31-R2 (Immutable Identifiers) 1. clarify that these recommendations: - allow unassigned characters for which properties such as normal form, script, etc. are unknown, which means: - identifiers can't be compared for NFC/NFKC/case-insensitive equality - can't be restricted per the TR39 recommendations - are meant for those cases (like XML) that can't update across versions of Unicode, and don't require information about normal form, script, etc. - disallowing unassigned characters is recommended as a best practice *for cases that do require this information*. 2. point out option of using a whitelist of allowed characters from a specific version of Unicode and never upgrading (such as the characters in IdentifierStatus.txt under http://www.unicode.org//Public/security/9.0.0). I write about this in my blog here: http://jonathanwarden.com/2016/08/18/immutable-unicode-identifiers/ 3. finally, the recommendation of allowing "any non-empty string of characters that contains no character having any of the following property values" would allow identifiers to start with (and contain only) digits. Another recommendation might be to use Default Identifiers, but to define <Continue> as "no character having any of the following property values", and <Start> as <Continue> minus characters with general properties m, n, or Pc.
From: Patrik Fältström
Subject: Re: Prep for Unicode 10.0, liaison contact
Date: Wed, 29 Mar 2017 10:54:33 +0200
... I have checked mechanically the 10.0.0 derived attribute values and compared with 9.0.0 defined attribute values according to the IDNA2008 algorithm and have not found any issues. What I am concerned about though is the continued communication that UTS#46 is something that can be used in applications when in reality that creates confusion regarding what code points can be used in identifiers like domain names. Specifically as normal users do not understand the various flags that one must define (to give the same and predictable result), the fact UTS#46 do not only recommend a certain mapping step (which IDNA2003 include, but not IDNA2008). And finally that according to my reading UTS#46 and UAX#31 do have different sets of allowed characters, which further creates confusion. For example when one look at what normal people believe is "emojis". I would like to encourage Unicode Consortium be more clear in its intentions with the future recommended use of UTS#46 and UAX#31 in the context of the IDNA2008 algorithm. Patrik Fältström IETF Liaison to Unicode Consortium
From: Mark Davis
Date: Thu, 6 Apr 2017 17:09:01 +0200
Subject: Re: Prep for Unicode 10.0, liaison contact
I agree with the suggestion to clarify the meaning and "default" values of the different flags used in http://www.unicode.org/reports/tr46/proposed.html#ToASCII and http://www.unicode.org/reports/tr46/proposed.html#ToUnicode As to UTS#46 and UAX#31, it was never a goal to make them align and they never have aligned. The primary goal for UAX#31 is to extend identifiers such as used in programming languages to Unicode (and UAX#31 defines several different kinds of identifiers). The primary goal for UTS#46 is to provide a solution for implementations that want to maintain backwards compatibility with IDNA2003, while extending the repertoire to modern Unicode versions based on the IDNA2003 principles. Of course, any implementation can always apply additional filters on top of UTS#46, including restricting to UAX#31 default identifiers, restricting to the IDNA2008 repertoire, applying tests such as in UTS#39 for mixed scripts, or applying ICANN rules. For IDNA2008, the data files in fact provide information about what IDNA2008 would allow, and also reference certain conditions in IDNA2008, such as ContextJ. (UTS#46 does project forward to the current Unicode release — based on the IDNA2008 principles — since the version of Unicode supported by IDNA2008 is old.) Mark