Concise term for non-ASCII Unicode characters

Daniel Bünzli daniel.buenzli at
Tue Sep 29 13:02:54 CDT 2015

Le mardi, 29 septembre 2015 à 18:30, Sean Leonard a écrit :
> Uh...I think you mean U+007F? :)

Yes… see how it was easy to point out that the definition was wrong. It would also have been, if this was code and we were talking about a protocol whose specification was using this notation rather than a new Unicode concept.

> Perhaps it's because I'm writing to the Unicode crowd, but honestly
> there are a lot of very intelligent software engineers/standards folks  
> who do not have the "basic knowledge of the Unicode standard" that is  
> being presumed. They want to focus on other parts of their systems or  
> protocols, and when it comes to the "text part", they just hand-wave and  
> say "Unicode!" and call it a day.

Introducing more terminology and jargon is not going to help in this case. Make the definitions as obvious as possible and strive for minimality in the exposed concepts.

> The fact that (modern implementations of) UTF-8 encoders and decoders are not supposed to process the surrogate code points (arbitrarily), for example, is a
> rather advanced topic

I wouldn't say this is advanced knowledge, this is basic knowledge any programmer dealing with Unicode text should have. FWIW this [1] is the absolute minimal knowledge I think programmers should have about Unicode (the last section can be skipped it's specific to a programming language). This corresponds to maybe 3 to 4 A4 pages. If your programmers are not able to grok this small amount of knowledge, hire better ones.




More information about the Unicode mailing list