From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Sep 12 2007 - 06:29:37 CDT
Kenneth Whistler wrote:
> Note, however, as regards names in particular, that some
> Unicode characters (e.g., noncharacters, private-use characters) don't
> have character names, ...)
I won't discuss the case of CJK and Hangul ranges, because they do have
complete properties including standard names.
But I still don't understand why the assigned controls and PUAs don't have
at least one default character name, at least computed algorithmically (like
Hangul and CJK ideographs).
For the stability of applications using these characters, it seems that
these controls and PUAs should still have a standard name (may be this name
is "U+xxx"...) to avoiud any possible future conflicts with other characters
that will get their own standard names, if the application needs to define a
name property for these characters instead of retuning a non unique empty
name or raising an exception (as if the characters were unassigned).
The most obvious missing names that we frequently encounter in texts encoded
with valid UTF are with controls.
Why Unicode still does not endorse the existing ISO 646 and ISO 8859 names
for these C0 and C1 controls? Why would it be a problem to assign such name
(a name is just a name, not a description of its semantic or intended use in
applications).
So:
* instead of having just "<control>" for U+001B, why not having "<control>
ESC" for the ASCII escape character (even if we know that some encodings
will not treat it as a distinct separate character but will use it as part
of the encoding scheme, which is NOT a standard UTF anyway)?
* instead of having just "<private use>" for U+E000, why not having
"<private use> E000" computed algorithmically for the standard name?
As an alternative, you could say that some applications could generate the
comment field or use it algorithmically, so that the strict compatibility
will be preserved for the existing name field. This would give the extended
names (respectively for the examples above):
* "<control> #ESC"
* "<private use> #E000"
I don't see which other standard it will break.
This archive was generated by hypermail 2.1.5 : Wed Sep 12 2007 - 06:31:43 CDT