Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

From: Karl Williamson <public_at_khwilliamson.com>
Date: Wed, 31 Aug 2011 16:57:23 -0600

On 08/30/2011 06:27 PM, Philippe Verdy wrote:
> After looking at the effective reason why this PRI #202 emerged (a
> request from Perl authors), exposed in UTC document number
> "L2/2011/11281", I think now that even *all* these aliases were not
> needed.
>
> The bug emerged in Perl only because a character named "BELL" was
> added, entering in conflict with the *custom* (non standardized) value
> alias that Perl used to reference the control.
>
> I think the problem has been taken by the wrong end. Really, the UCS
> namespace of characters has *never* been designed to allow any custom
> alias. In other words, what Perl did, by adding those custom aliases,
> was clearly not conforming to the standard.
>
> What Perl should have used is not reusing the same property to
> reference both the standard names (or aliases) and its own custom
> aliases (even if those aliases are needed and widely known).
>

Unicode 6.0 broke UTS #18, which since 1999 has suggested that BELL be
the name used in regular expressions for U+0007. In 2003, this was
strengthened to "should" be used. The breakage occurred by requiring
that BELL instead be the name for a different code point. By breaking
UTS #18, all implementations of it, including Perl's, were broken,
causing real harm to real code and real people. For this reason, Perl
has not completely adopted 6.0.

Further, UTS #18 encourages implementations to do exactly what Perl did:
"The ISO names for the control characters may be unfamiliar, ... so it
is recommended that they be supplemented with other aliases. For
example, for U+0009 the implementation could accept the official name
CHARACTER TABULATION, and also the aliases HORIZONTAL TABULATION, HT,
and TAB."

See http://www.unicode.org/reports/tr18/#Name_Properties

The genesis of this proposal was to prevent the Unicode Consortium from
making this kind of mistake again. The language in UTS #18 mentioning
the TAB variants also dates to 2003. I think this example makes it
clear why more than one alias may be needed per code point.

Of course, PRI #202 is not the only mechanism possible to achieve the
needed goal of preventing another mishap like BELL. But the consensus
in the discussion about it was that is was the easiest route to get there.
Received on Wed Aug 31 2011 - 18:03:16 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 31 2011 - 18:03:25 CDT