I agree with everything in your post, but I have some additional comments:

> The promoters of CESU-8 say that data in this format already exists in the
> real world, and the purpose in describing it in a UTR is to codify an
> existing de-facto standard.

Note that it is *and has always been* non-compliant to generate the 6-byte
form for supplementary characters. So if any process generated such data,
that was a bug.

> So my question is: What supplementary characters are currently, TODAY,
> stored in Oracle or PeopleSoft databases that require the creation of a new
> encoding scheme to ensure they can continue to be sorted consistently?

I suspect effectively none. However, even though we don't *expect* such data
to exist, we would still like to guarantee that an existing database that is
stored using 16-bit code units, will not become inconsistent if it does contain
any surrogate codes.

There are at least three ways to guarantee this:

Option A:
 - Prevent strings with surrogate codes from being added to the database.
 - On the request of an administrator, do the following:
   1. Verify that the database does not contain any surrogate codes.
   2. Obtain a global lock.
   3. Set a flag that switches to using code point order, and enables
      adding strings with supplementary characters.
   4. Release the global lock.

Option B:
 - Treat strings as having a flag specifying 'new' or 'old', where all
   existing strings are old.
 - Tag new strings added to the database as 'new'.
 - Sort characters in the following order:
     U+D800..U+DFFF in 'old' strings
     U+10000...10FFFF in 'new' strings.

Option C:
 - Represent supplementary characters internally as three code units:
   <0xFFFF, high_surrogate, low_surrogate>. This will sort in the
   same way as Option B (using the database's existing sorting algorithm)
   provided that there are no instances of U+FFFF followed by a
   non-surrogate, which there should not be.
 - Make sure that U+FFFF never appears outside the database implementation,
   i.e. delete it on export or when passing a string to a stored procedure,
   and add it where necessary when storing strings.
 - Despite slightly changing the representation of supplementary characters,
   this is conformant to Unicode 3.1 because the U+FFFF noncharacter only
   occurs internally.

Each of these options has advantages and disadvantages, but they would
all work, they allow for interoperability even between vendors who
choose different options, and none of them inflict broken UTFs or sort
orders on anything that does not directly work on the database file

