LemmingToo's top X Changes to SQL Server

Wed Oct 22, 2003 by Chris Miller

LemmingToo and I just had lunch, and discussed some changes we'd like to see in SQL Server.

Correlated Aggregation. If I have a table with a primary key called PK, a Zip Code called Zip, and the number of days associated with that zip code called ServiceDays, I want to be able to pull back an aggregated list of zip codes and the maximum service days for each, and the PK's that include the Zip/Serviceday combination I'm looking for. This requires 2 queries (ok, you can do it in one with a derived table, but as far as the execution time is concerned that's two queries). I also want to have a "I don't care, pick one" option to resolve ties (i.e. if two PK's have the same ServiceDays for a given Zip, pick one, I don't care if it's the max PK, the min PK, just pick one).
The way GROUP BY works is stupid. If you've got this query:

select foo, count(*) from mytable

Guess what the GROUP BY clause is going to have in it? We couldn't think of any time when the GROUP BY wasn't deterministic based on that which is not aggregated in the SELECT clause. This could be implemented as an auto-complete in Query Analyzer.

HAVING and ORDER BY should allow the use of column aliases so you don't have to write stupid crap like ORDER BY CASE blah blah blah. Inevitably, the CASE changes, you get the error, and you go copy/paste the complex statement into the ORDER BY. Yeah, you can do that with positional arguments in your order by, but see number 4.
Positional arguments in ORDER BY are stupid. Add something to the SELECT clause and accidentally forget to modify your ORDER BY and you're screwed. Allow aliases instead (see 3).
UPSERT. I want to be able to issue a command that will either do an update or an insert, depending on the values. If the primary key of the row being UPSERT'ed already exists, then update the row with the values provided, otherwise insert a new row. Presently, I have to check for existance, then decide on either doing the insert or the update. This would not only reduce the amount of code I have to write, but it would also be more efficient, since SQL Server would only check existance of the PK one time instead of possibly doing it twice (once on the existance check and again if the insert happens). (Alternatively, I suppose I could do the insert regardless, trap the error, and then update everything that errored out, but that's just weird and wrong.)
This one's a bit hard to explain.

I want multi-targeted UPSERT statements.

What that means is that I want to be able to write a SELECT statement, and take the output and redirect it into multiple tables, applying inserts and updates as needed, and doing it all in the right order to handle the foreign key relationships. Similar in function to an INSERT/SELECT, but with multiple tables being inserted/updated.

The syntax for this would be horrifically complicated. Given the number of times I've had to solve this problem, I'd rather have to learn the horrifically complicated syntax instead of creating a temp table to keep track of everything.

I think I got everything we discussed. Comments are, of course, welcome.

rs.

Update
L2 sent me thoughts on some syntax for a multi-target INSERT:

Here's the syntax for it. The keyword USING is totally made up and can be changed. Notice that b.second isn't actually in the select list, but I'm claiming to be allowed to use it anyway. I'd be willing to entertain the idea of putting a "glue" keyword between the two insert statements similar to "UNION" if the parser needs it. Naturally the Group by ought to be optional :)

insert into table1
(col1, col2)
using (a.first, b.second)
insert into table2
(col1, col2)
using (a.first, avgSecond)
select a.first, AVG(b.second) as avgSecond
from tableA as a
inner join tableB as b
on a.first = b.first
Group by a.first

rs.

Legacy Comments


Jeff 2003-10-22	re: LemmingToo's top X Changes to SQL Server Regarding #2: Here's a situation in which GROUP BY is not automatic: Select FirstName + ' ' + LastName, count(*) FROM Table GROUP BY firstname, lastname That should never be written as GROUP BY Firstname + ' ' + LastName. While it techinically can (and sadly all too often is) it will be much more efficient and able to make use of indexes if you properly GROUP queries like this. It's also "better SQL" to do it right.

Chris 2003-10-25	re: LemmingToo's top X Changes to SQL Server I agree with all of your issues! I believe Mysql implements a few of these items. For instance, Upsert exists and is called 'Replace'. I think it even supports aliases in the group by clause. I'd love to see these features in sql server.

Sam 2005-06-24	re: LemmingToo's top X Changes to SQL Server Actually, doing the insert, trapping the error and then doing an update is the correct way to do this. If you do a select first you run the risk that another thread will beat you to the insert and your insert will fail anyway. Why? Because if you do a select for the record and it doesn't exist there isn't a lock to prevent the other thread from beating you out. Of course, you can optimize in cases where you (a) know you will be isolated and/or (b) can acquire an exclusive lock on the table ahead of time. You just have to remember that your code is not thread-safe or well-behaved in-terms of managing contention.

Baskar 2005-08-23	re: LemmingToo's top X Changes to SQL Server TEST