RE: UTF-8 Syntax

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Jun 11 2001 - 07:20:58 EDT


Toby Phipps wrote:
> [...] There's been a lot of talk about the UTF-8S
> proposal on both the unicode and unicore list, so please
> forgive me (and
> notify me if you feel the need) if I have missed any of the
> salient points that require a response.

You made a very clever summary of such a long thread. However, I made an
assumption that you don't mention, and I would like you to verify it:

7. A client has no reason to expect data in any particular order, unless it
explicitly requested an "order by" clause.

My assumption was that, in the first case (no sort order requested by the
client), a server could in theory provide a result set randomly shuffled. Of
course, I know that this won't normally happen but, however, the server is
allowed to provide whatever optimized binary order that it uses in its keys.

In the second case (a specific order requested by the client), I assumed
that no kind of binary order is acceptable. A "proper" sort has been
requested and must be done: numbers shall be sorted numerically, dates shall
be sorted chronologically, and text shall be sorted lexically. Of course,
this may be computationally very heavy but, alas, if the client commanded
you to do so, it means that it needs it and it is prepared to wait for it.

Is this assumption correct, according to the current definition of SQL?

If yes, I feel that your answer to point 1 is wrong. At most, the problem of
defining the *default* *binary* text order for a result set could be the
object of a private (or public, if you prefer) agreement between database
vendors. But text encoding as used in other fields does not need to be
affected or even to know about this.

BTW, as a client programmer, I would always try to avoid writing code that
relies on binary order.

Also some of your arguments at point 4 loose significance:

> This is incredibly inefficient, not only
> because significant amounts of temporary space needs to be
> allocated and
> freed, but also because the entire result set of the query has to be
> processed and sorted before the first row is returned. With
> result sets
> involving several million rows, this is a very significant overhead,
> especially if the typical user only looks at the first couple
> of hundred.

If you are talking about binary order, and my point 7 is true, all this is
meaningless: you are certainly allowed to just return the first couple
hundred records that are in your indexes, whatever they are.

If you are talking about fulfilling an "order by" request, then it is
useless to comply: you gotta do it, and no UTF whatsoever can save you from
allocating the necessary resources.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT