Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0 from Ken Whistler on 2011-08-26 (Unicode Mail List Archive)

From: Ken Whistler <kenw_at_sybase.com>
Date: Fri, 26 Aug 2011 17:28:34 -0700

On 8/26/2011 5:01 PM, Philippe Verdy wrote:
>> "we could as well include..." are dangerous words here. Going encyclopedic
>> > is*completely* at odds with the normative intention of NameAliases.txt.
> Your statement then contradicts what PRI 202 says:
> "the intent is to add various standard and de facto aliases for
> control characters, which have no names defined for them in the
> Unicode Standard, as well as various character abbreviations which are
> in widespread use."

No, it does not, because you have conveniently omitted the next paragraph of
the PRI, which explains the context of use:

"Because NameAliases.txt is used as part of the input which enforces
name uniqueness for the Unicode character namespace, adding aliases for
control codes and commonly used abbreviations for characters would
prevent accidental name collisions in the future for character "name"
matches in implementations such as regular expressions."

>
> It explicitly links the Unicode standard with others, at least by
> reference.

No, it does not.

> If these aliases are to be ALL unique in the UCS namespace,
> this means that it will permently link those standards to the UCS.

No, it will not. Only ISO 6429, which is *already* de facto linked to
the UCS
for aliases for C0 and C1 control codes.

>
> May be it will be good for other standards that are now stable (or
> frozen and kept for historical reasons, this is the case of the
> standard Postscript namespace, frozen now in the AGL and in the
> PostScript's "standardEncoding", for use in TrueType, OpenType, and
> PDF).

Well, conceivably it could be "good" for some other standard, but it would
certainly not be good for the Unicode Standard to pollute the unique
namespace with an encyclopedic listing of "names" of arbitrary entities.

>
> Yes I admit that the Postscript namespace is a bit different: it is
> glyph-based rather than character-based, which also means that several
> UCS characters may map by default to the same glyph name.

And I think we can stop right there. The problems are manifest.

> Then why do you think, in the PRI 202 that some standards would have
> their character names becoming part of the UCS namespace ?

Because by *definition* adding an entry to NameAliases.txt adds it to
the Unicode namespace. That is how the file is designed.

> They could
> remain as well informative, and we could have another informative
> datafile (in the "MAPPINGS" subdirectory) to reference those standards
> only informatively, without introducing them in the UCD...

That is out of scope for this PRI, which is specifically about additions
to NameAliases.txt, to prevent the possibility of future name collisions
such as U+1F514 BELL with the ISO 6429 control function name "BELL".

>
> For example the proposed addition of ISO 6429 names don't have to be a
> normative part of the UCD, they could remain informational as well,
> defined outside of it.

No, they need to become a normative part of the Unicode namespace. That
is *precisely* the problem that the PRI is addressing.

> They are not (and should not be) needed to
> conformingly implement the UCS and Unicode algorithms, unless the
> Unicode standard really wants to permanently bind the ISO 6429
> standard, possibly against the intent of the authors of this standard.

It has *nothing* to do with the intent of the authors of ISO 6429. It
has to do
with the implementation requirements of users of the Unicode Standard,
and in particular for regex. Perl and other regex users do not want a
name match
in a Unicode regex expression to be ambiguous.

> Was there such formal request from the ISO standard maintainers, and
> an agreed policy ?

It has nothing to do with ISO standard maintainers.

And yes, there was a formal request to do something about this problem,
but it came from one of the maintainers of Perl.

--Ken
Received on Fri Aug 26 2011 - 19:30:12 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 26 2011 - 19:30:13 CDT