Philippe Verdy verdy_p at wanadoo.fr
Thu Jun 5 19:09:30 CDT 2014

Warning !

This definition of allowed identifiers has severe security risks: it does
not support any kind of normalization or canonical equivalence, and it's
impossible to use normalization in the language lexer/parser while making
sure that they will be stable over the set of unassigned characters that
may be assigned later.

This could cause unecpected bindings initially impossible to enter in
collision later with new normalizations (notably if unassigned code poitns
get assigned to combining characters with non-zero combining class, or to
base characters with combining class 0 but forbidden from recombining (i.e.
disallowed in standard normalization forms).

No programming language should allow using unassigned characters, they
should be checked and marked as invalid (note; this check can work in a
compiled version of the language, but will not work in a repository of
source code where the only check is possible by parsing all source files in
a repositry to make sure that there's no unassigned codepoint anywhere in
their source text ; the source repository should enforce this by defining
clearly the UCS version it accepts for source files, but as far as I know,
no usual source repositories perform this check, that can only be done by
extracting all sources from it using some bot script that will detect
unassigned code points in these sources).

The alternative of not allowing any normalization of identifiers is not
safe when source code editors may easily renormalize the identifiers, or
when these source may be edited by different users using different input

2014-06-05 17:27 GMT+02:00 "Martin v. Löwis" <martin at v.loewis.de>:

> Am 04.06.14 11:28, schrieb Andre Schappo:
> > The restrictions seem a little like IDNA2008. Anyone have links to
> > info giving a detailed explanation/tabulation of allowed and non
> > allowed Unicode chars for Swift Variable and Constant names?
> The language reference is at
> https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html
> For reference, the definition of identifier-character is (read each
> line as an alternative)
> identifier-character → Digit 0 through 9
> identifier-character → U+0300–U+036F, U+1DC0–U+1DFF, U+20D0–U+20FF, or
> U+FE20–U+FE2F
> identifier-character → identifier-head­
> where identifier-head is
> identifier-head → Upper- or lowercase letter A through Z
> identifier-head → U+00A8, U+00AA, U+00AD, U+00AF, U+00B2–U+00B5, or
> U+00B7–U+00BA
> identifier-head → U+00BC–U+00BE, U+00C0–U+00D6, U+00D8–U+00F6, or
> U+00F8–U+00FF
> identifier-head → U+0100–U+02FF, U+0370–U+167F, U+1681–U+180D, or
> U+180F–U+1DBF
> identifier-head → U+1E00–U+1FFF
> identifier-head → U+200B–U+200D, U+202A–U+202E, U+203F–U+2040, U+2054,
> or U+2060–U+206F
> identifier-head → U+2070–U+20CF, U+2100–U+218F, U+2460–U+24FF, or
> U+2776–U+2793
> identifier-head → U+2C00–U+2DFF or U+2E80–U+2FFF
> identifier-head → U+3004–U+3007, U+3021–U+302F, U+3031–U+303F, or
> U+3040–U+D7FF
> identifier-head → U+F900–U+FD3D, U+FD40–U+FDCF, U+FDF0–U+FE1F, or
> U+FE30–U+FE44
> identifier-head → U+FE47–U+FFFD
> identifier-head → U+10000–U+1FFFD, U+20000–U+2FFFD, U+30000–U+3FFFD, or
> U+40000–U+4FFFD
> identifier-head → U+50000–U+5FFFD, U+60000–U+6FFFD, U+70000–U+7FFFD, or
> U+80000–U+8FFFD
> identifier-head → U+90000–U+9FFFD, U+A0000–U+AFFFD, U+B0000–U+BFFFD, or
> U+C0000–U+CFFFD
> identifier-head → U+D0000–U+DFFFD or U+E0000–U+EFFFD
> As the construction principle for this list, they say
> "Identifiers begin with an upper case or lower case letter A through Z,
> an underscore (_), a noncombining alphanumeric Unicode character in the
> Basic Multilingual Plane, or a character outside the Basic Multilingual
> Plan that isn’t in a Private Use Area. After the first character, digits
> and combining Unicode characters are also allowed."
> Regards,
> Martin
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140606/a92c643e/attachment.html>

More information about the Unicode mailing list