U+hhhh[h[h]] NAME syntax

Philippe Verdy verdy_p at wanadoo.fr
Sat Aug 13 02:29:05 CDT 2016

These are just conventions to use when there are no other context
explaining why some other notation could be more useful or more readable.
In plain text, we are not supposed to parse the content by automated tool
but read it. And even in the standard itself there are cases where shorter
notations are used, and they are explained in each of them, because this
makes the overall text easier to read, or allows compressing the tables.

Notable the character names are frequently abbreviated (e.g. CR, LF,
CR+LF), or completely omitted if the code point is specified.

The UCD itself contains lots of data that just reference the code points by
alsso omitting the "U+" prefix In all cases this is a local convention that
applies instead of the generic convention which is just suggested for
use out of contexts. In some prammatic contexts, the notation used is
language-specific and used appropriately (such as \uNNNN in
Javascript/JSON/Java) without needing any prior explaination (these
notations are already explained for those languages in their own standard).

In emails, use any convention you want: an email normally explains its own
context when needed (it may be needed to read other messages in a
discussion thread to explain these personal conventions), and then people
write them the way they want as long as it is clear for readers. Emails
will also refer frequently to other conventions used in the standard or in
programing languages. The interest of these notations however may be found
when performing full text searches in collections of emails or messages in
a forum to see where a particular character was cited and discussed. But
generally many discussions are also speaking about other related characters
and not all of them are cited because discussions are relating to some of
their common properties: you'll need to search for other terms (not always
part of the standard or its technical annexes as they may be talking about
non-standardized but common usages, or could speak about proposals or
changes in existing properties, notably the informative properties)

I see little interest to force anyone to use the U+NNNN NAME convention
everywhere, as it is overlong and may instead obscure the discussions. Even
when it is used, the NAME will be frequently abbreviated (such as dropping
the script name prefix or common words such as LETTER or DIGIT). And given
that character names are not case-significant, they will be frequently
written using lowercase, or mixed case, or just by presenting the verbatim
character itself.

2016-08-13 8:53 GMT+02:00 Garth Wallace <gwalla at gmail.com>:

> Appendix A: Notational Conventions
> On Friday, August 12, 2016, Sean Leonard <lists+unicode at seantek.com>
> wrote:
>> It appears that U+hhhh[h[h]] NAME syntax is a very common--one might say
>> "standard"--way of representing a particular Unicode character or code
>> point in text.
>> It is the way that the Unicode Standard 9.0.0 refers to particular
>> characters, and I have seen it around quite a bit. The Unicode Standard
>> appears to put the NAME in small-caps format (but a plain text PDF search
>> using Adobe Acrobat DC suggests that the underlying characters are
>> lowercase), while in plain text, the name is generally all-capitalized (as
>> it appears in the UCD).
>> Is there a section of the Unicode Standard, or some TR, that discusses
>> this format or gives it a formal name? (I hunted but did not find
>> discussion in the Unicode Standard.) Is it given any kind of preference or
>> recommendations over other forms of identifying Unicode code points or
>> characters?
>> Thanks,
>> Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20160813/7b22e8bb/attachment.html>

More information about the Unicode mailing list