From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Sep 13 2007 - 15:34:52 CDT
Kenneth Whistler [mailto:kenw@sybase.com] wrote:
> > Regarding my comment about missing names, I was not pretending that
> these
> > complemented names should be defined the same way as other assigned
> names.
>
> I didn't assume that you were *pretending* that to be the case;
> I observed that you were *asserting* it to be the case.
>
> > But references to characters by name is better than reference by
> codepoint
> > in many documents as it makes the reference clearer.
>
> Ah, now you change your tune. I have no quarrel with that claim. Certainly
> being able to refer to common use control codes by names such
> as "tab" and "carriage return" instead of hexadecimal U+0009 and
> U+000D makes the intent clearer to everyone -- even those of us
> who spend much of our day thinking in hexadecimal.
>
> But in your prior contribution, you were talking about alleged
> problems of stability of applications because of characters which
> currently have no normatively defined character name attribute.
I have not changed my tune nor even my intimate intuition if what Isaidwas
not clear and could be interpreted differently.
The need for stable names for C0 andC1 controls remains, and when I speak
about stability, it's not within the Unicode standard itself (because such
names are still not present), but within applications or documents needing
names to reference them in a more clear way than just U+00xx (which is not
ambiguous but not clear enough, for readers, given that even Unicode needs
to define "aliases" to reference them in many places in its annexes.
So your attempt to say that the proposed names using "<>" or "# within names
were non conforming are not relevant. What application need are stable names
even if those names come from another character property which does not
respect the current rules for existing standard character names. After all,
Unicode references the "na1" property (see the XML proposed format for the
UCD),andcould as well have another property if it does not want to change
the value of existing properties. And we have lots of other properties for
CJK ideographs.
Most commonly used names are those based on 2/3 character abbreviations, so
these "aliases" are still the best: "NUL, ..., TAB, LF, VT, FF, CR, ... DC1,
..., CSI, ...".
I won't take the 2-characters Keld's mnemonic as they are broken even if
they remain in old charset definition RFCs: these have been deprecated since
long by using charset tables based on Unicode/ISO 10646 code points as the
central encoding, and by the mappings published in Unicode (even if they are
informative, they have equivalent content and these tables are now used in
many systems, possibly compiled in some proprietary binary format).
But at least, these names would simplify the writing of new specifications,
or could help disambiguate some old RFCs by making them more precise if some
normative reference was simply available to specify this without long lists
of local definitions in each document needing them (including in the Unicode
standard annexes where these names are needed and redefined locally).
This archive was generated by hypermail 2.1.5 : Thu Sep 13 2007 - 15:39:37 CDT