Concise term for non-ASCII Unicode characters
daniel.buenzli at erratique.ch
Tue Sep 29 13:02:54 CDT 2015
Le mardi, 29 septembre 2015 à 18:30, Sean Leonard a écrit :
> Uh...I think you mean U+007F? :)
Yes… see how it was easy to point out that the definition was wrong. It would also have been, if this was code and we were talking about a protocol whose specification was using this notation rather than a new Unicode concept.
> Perhaps it's because I'm writing to the Unicode crowd, but honestly
> there are a lot of very intelligent software engineers/standards folks
> who do not have the "basic knowledge of the Unicode standard" that is
> being presumed. They want to focus on other parts of their systems or
> protocols, and when it comes to the "text part", they just hand-wave and
> say "Unicode!" and call it a day.
Introducing more terminology and jargon is not going to help in this case. Make the definitions as obvious as possible and strive for minimality in the exposed concepts.
> The fact that (modern implementations of) UTF-8 encoders and decoders are not supposed to process the surrogate code points (arbitrarily), for example, is a
> rather advanced topic
I wouldn't say this is advanced knowledge, this is basic knowledge any programmer dealing with Unicode text should have. FWIW this  is the absolute minimal knowledge I think programmers should have about Unicode (the last section can be skipped it's specific to a programming language). This corresponds to maybe 3 to 4 A4 pages. If your programmers are not able to grok this small amount of knowledge, hire better ones.
More information about the Unicode