Many of the explanations of UTF-8 discuss encoding of code points on Code
Planes 1-16 using the intermediate concept of surrogates as in UTF-16. I
believe that this is both unnecessary and misleading, as UTF-8 is
fundamentally a direct 21-bit encoding scheme, as may be seen in the
attached document. So, I believe that the concept of surrogates is not
relevant for UTF-8 encoding on Code Planes above the BMP.

This is a slightly different explanation of how UTF-8 works, written by me
for the Ultracode(r) bar code spec (Ultracode encodes all of Unicode 3
directly). If any Unicodotti find any errors in it... please let me know!


I need to know exactly how UTF8, UTF16 and UTF32 is encoded. I heard
that UTF32 can have surrogates, so I can't just expect them
to be scalar values.

Having a nice detailed and clear explanation would help, with
plenty of examples and effects of the encoding and all kinds of
things to make it easier to understand would help.

Or perhaps I'm just reacting to the confusion of the UniCode
website and its not that hard to understand and a simple definition
would do? But the first idea certainly wouldn't hurt.

