From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Sep 12 2007 - 23:53:39 CDT
Kenneth Whistler wrote:
> And short identifiers don't follow the name syntax restrictions,
> because they allow one character, "+", that is not allowed in character
> names.
Regarding my comment about missing names, I was not pretending that these
complemented names should be defined the same way as other assigned names.
But references to characters by name is better than reference by codepoint
in many documents as it makes the reference clearer.
Even Unicode needs to assign them names locally in many places to controls
to make things clearer (look at the documents and standard annexes about the
BiDi algorithm and line/word breaking.)
Why I spoke about ISO 8859-1 and ISO 646 I spoke about their reference to
the C0 and C1 subsets. But also about their definition in IANA charsets that
DO include the C0 and C1 subsets, not just the G0 and G1 characters.
(there's a difference between "ISO-8859-1", the IANA charset made of "ISO
8859-1 for G0 plus C0 controls, and "ISO 8859-1"; notice the addition of the
hyphen; the same is true between "ISO 646" and "ISO-646".)
Even if there are non agreed names across several references about names
assigned to C0 and C1 controls, at least one name should be specified
consistently for use in Unicode/ISO 10646 contexts.
When Ispoke about possible conflicts, its because applications frequently
need to display names for controls. These names will preferably be those
assigned by Unicode and ISO 10646 when thy exist, but if they are missing,
the names will be inferred in some way, using the historic "na1" property,
if available or some other legacy conventions, causing possible confusion if
there's no agreed convention.
Note that I know that not all C1controls have names, but the names are
appearing in IBM references about EBCDIC, from where these controls were
inherited and remapped into C1 controls. The names are used in transcoding
tables (that have existed since long before Unicode/ISO 10646).
I don't see why not assigning a name (possibly through a separate property)
for these controls would be a problem for Unicode and iSO 10646 stability.
But it's clear that these names do exist in many other references, notably
within many RFCs and protocol specifications. You just need to choose a name
that matches the most common usage (even if there are other inconsistent
assignements in other references, which may be deprecated or never meant to
be normative).
This archive was generated by hypermail 2.1.5 : Wed Sep 12 2007 - 23:56:41 CDT