U+hhhh[h[h]] NAME syntax
lists+unicode at seantek.com
lists+unicode at seantek.com
Sat Aug 13 10:12:06 CDT 2016
> On Aug 13, 2016, at 2:33 AM, Marcel Schneider <charupdate at orange.fr> wrote:
> On Sat, 13 Aug 2016 09:29:05 +0200, Philippe Verdy wrote:
>> I see little interest to force anyone to use the U+NNNN NAME convention
>> everywhere, as it is overlong and may instead obscure the discussions. Even
>> when it is used, the NAME will be frequently abbreviated (such as dropping the
>> script name prefix or common words such as LETTER or DIGIT). And given that
>> character names are not case-significant, they will be frequently written
>> using lowercase, or mixed case, or just by presenting the verbatim character
> One advantage I see in using capitalized character names is in making them
> unambigously recognizable as identifiers, in order to prevent readers from
> mistaking them as descriptors.
> However I admit that I often unify casing pairs by dropping the CAPITAL and
> SMALL attributes, as in LATIN LETTER AE, but it would be more accurate to write
> LATIN CAPITAL/SMALL LETTER AE. By contrast I wouldnʼt do that when referring to
> the LATIN CAPITAL and SMALL LIGATURE OE, because the term “ligature” is an abusive
> relict enforced by the ISO redactor at the time, and set back to “letter” in the
> case of the Æ (as discussed past year). Here the advantage of using a translation
> is to be able to correct without risking confusions.
> Another advantage is in highlighting the names against the surrounding text.
> Avoiding uppercase—e.g. from people hating their Caps Lock toggle key, who
> Iʼve read they do exist but are very uncommon in the country where we live—
> would need workarounds like using quotation marks, which in this context are
> almost always misleading.
> As of the U+ notational prefix for current text, I see it as extremely useful
> and I always apply it except, as Philippe states, in some tabular data,
> which is but following the pattern used in the NamesList (which Iʼm keeping
> constantly opened in my text editor).
> Using the U+ prefix throughout has the additional advantage of promoting
> Unicode in the mind of people—an urgent challenge, […]
I have been reviewing draft-iab-rfc-nonascii-02 <https://tools.ietf.org/html/draft-iab-rfc-nonascii-02>, which formally opens the RFC series to UTF-8 encoded characters. (Look at the PDF version, which shows characters beyond the ASCII range.)
I was surprised that Section 3.4 provides no less than *six* notational alternatives, none of which conform to Appendix A of TUS. There might be valid grammatical reasons to notate differently than Appendix A, but I would think that Appendix A style U+2206 INCREMENT would be the best choice, as in:
1. Temperature changes in the Temperature Control Protocol are
indicated by "Δ" U+2206 INCREMENT.
where U+ NAME replaces the part-of-speech “the XYZ character”, the character itself is quoted directly in front of the U+, and parentheses are not needed.
(I am actually in favor of curly quotes “Δ” in such a case, but that discussion should probably be had in the IETF.)
Interestingly, TUS 9.0.0 is not internally consistent, but there is a trend that when the character is quoted, it is put in curly quotes and is placed between the U+ syntax and the NAME, as in:
Uppercasing of U+00DF “ß” latin small letter sharp s to …
U+2061 Ê function application has no effect on the text display…
(Note: the Ê character appears in TUS as f() in a box…I am copying and pasting the text directly on my Mac from Acrobat to Mail.app. And, obviously, it’s copying and pasting the small-caps in lowercase.)
In plain text, ALL-CAPS names are superior to mixed case or lowercase names. However, in stylized text, small-caps not only looks better but offers a more convenient visual and semantic way to differentiate the part-of-speech.
I may have to suggest that small-caps be added as a stylistic element to the new xml2rfc format, or, that a new element be provisioned specifically to identify Unicode code points, which automatically get stylized appropriately to the output format (ALL-CAPS for plain text, stylized small-caps for marked up text).
More information about the Unicode