RE: UTF-8 Syntax

From: toby_phipps@peoplesoft.com
Date: Tue Jun 12 2001 - 05:09:16 EDT


Marco Cimarosti <marco.cimarosti@essetre.it> wrote:
>My assumption was that, in the first case (no sort order requested by the
>client), a server could in theory provide a result set randomly shuffled.
Of
>course, I know that this won't normally happen but, however, the server is
>allowed to provide whatever optimized binary order that it uses in its
keys.

This assumption is essentially correct. Without an explicit ORDER BY
clause, the server is in its rights to present the result set in any order,
not necessarily the binary order of its encoding, but any order. The most
recent SQL Standards I have easy access to is the CD of ISO/IEC
9075-2:199x, which states under 4.34 (Cursors):

"If the <declare cursor> does not include an <order by clause>, or includes
an <order by clause> that does not specify the order of the rows
completely, then the rows of the table have an order that is defined only
to the extent that the <order by clause> specifies and order and is
otherwise implementation dependent."

>In the second case (a specific order requested by the client), I assumed
>that no kind of binary order is acceptable. A "proper" sort has been
>requested and must be done: numbers shall be sorted numerically, dates
shall
>be sorted chronologically, and text shall be sorted lexically. Of course,
>this may be computationally very heavy but, alas, if the client commanded
>you to do so, it means that it needs it and it is prepared to wait for it.

This is not correct. The SQL standards provide a framework for a
statement-level ordering by way of the ORDER BY clause, and also provides
the concept of a COLLATION that may be specified at different levels,
however I have never seen this implemented. Unless overridden by a
vendor-specific session-level setting, ORDER BY returns rows in whatever is
the default sort order of the database server. Some vendors allow the
override of this ORDER BY to fordce a specific collation, however this
syntax is incredibly database platform dependent, and does not exist on
many platforms.

For example, on Microsoft SQL Server, if I have a server using binary sort,
and want a different collation for a specific statement, you can use the
syntax ... ORDER BY <complete column specification> COLLATE <collation
name>.

On Oracle, the same statement would read: ... ORDER BY NLS_SORT(<column
name>, "NLS_SORT=<collation name>")
On Sybase, the same statement would read: ... ORDER BY SORTKEY(<column
name>,"collation")
On DB2/UDB an DB2 for 390, you can't override the default server collation
at the statement or session level

As you can see, the results of a simple ORDER BY are very dependent on the
platform you're working on, the session-level setting (if any), and the
default collation of the server. In order to provide a cross-platform
client against all these databases, some assumptions (and customer
restrictions) need to be placed on the physical encoding (which typically
implies the default sort) of the server in order to be able to rely on any
consistent collation.

>BTW, as a client programmer, I would always try to avoid writing code that
>relies on binary order.

Absolutely. However, some languages (such as COBOL) tend not to make this
possible, at least without very messy compiler-dependent code. It's also a
very common depdenence seen in large-scale database client implementations
- not great programming style, but there you go.

>Also some of your arguments at point 4 loose significance:

>If you are talking about binary order, and my point 7 is true, all this is
>meaningless: you are certainly allowed to just return the first couple
>hundred records that are in your indexes, whatever they are.

>If you are talking about fulfilling an "order by" request, then it is
>useless to comply: you gotta do it, and no UTF whatsoever can save you
from
>allocating the necessary resources.

Unfortunately, this falls apart here. The ORDER BY clause is the only way
of guaranteeing that the result set will be ordered in any logical fashion
at all. There's nothing saying that a statement with an ORDER BY clause
can't return a binary order, in fact, that's the limit of its functionality
on some platforms (DB2 for OS/390), and much like the UTF-8S proposal, they
use different binary encodings to ensure appropriate collation.

Toby.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT