LemmingToo and I just had lunch, and discussed some changes we'd like to see in SQL Server.
1. Correlated Aggregation. If I have a table with a primary key called PK, a Zip Code called Zip, and the number of days associated with that zip code called ServiceDays, I want to be able to pull back an aggregated list of zip codes and the maximum service days for each, and the PK's that include the Zip/Serviceday combination I'm looking for. This requires 2 queries (ok, you can do it in one with a derived table, but as far as the execution time is concerned that's two queries). I also want to have a "I don't care, pick one" option to resolve ties (i.e. if two PK's have the same ServiceDays for a given Zip, pick one, I don't care if it's the max PK, the min PK, just pick one).
2. The way GROUP BY works is stupid. If you've got this query:
select foo, count(*) from mytable
Guess what the GROUP BY clause is going to have in it? We couldn't think of any time when the GROUP BY wasn't deterministic based on that which is not aggregated in the SELECT clause. This could be implemented as an auto-complete in Query Analyzer.
3. HAVING and ORDER BY should allow the use of column aliases so you don't have to write stupid crap like ORDER BY CASE blah blah blah. Inevitably, the CASE changes, you get the error, and you go copy/paste the complex statement into the ORDER BY. Yeah, you can do that with positional arguments in your order by, but see number 4.
4. Positional arguments in ORDER BY are stupid. Add something to the SELECT clause and accidentally forget to modify your ORDER BY and you're screwed. Allow aliases instead (see 3).
5. UPSERT. I want to be able to issue a command that will either do an update or an insert, depending on the values. If the primary key of the row being UPSERT'ed already exists, then update the row with the values provided, otherwise insert a new row. Presently, I have to check for existance, then decide on either doing the insert or the update. This would not only reduce the amount of code I have to write, but it would also be more efficient, since SQL Server would only check existance of the PK one time instead of possibly doing it twice (once on the existance check and again if the insert happens). (Alternatively, I suppose I could do the insert regardless, trap the error, and then update everything that errored out, but that's just weird and wrong.)
6. This one's a bit hard to explain.
I want multi-targeted UPSERT statements.
What that means is that I want to be able to write a SELECT statement, and take the output and redirect it into multiple tables, applying inserts and updates as needed, and doing it all in the right order to handle the foreign key relationships. Similar in function to an INSERT/SELECT, but with multiple tables being inserted/updated.
The syntax for this would be horrifically complicated. Given the number of times I've had to solve this problem, I'd rather have to learn the horrifically complicated syntax instead of creating a temp table to keep track of everything.
I think I got everything we discussed. Comments are, of course, welcome.
L2 sent me thoughts on some syntax for a multi-target INSERT:
Here's the syntax for it. The keyword USING is totally made up and can be
changed. Notice that b.second isn't actually in the select list, but I'm
claiming to be allowed to use it anyway. I'd be willing to entertain the
idea of putting a "glue" keyword between the two insert statements similar
to "UNION" if the parser needs it. Naturally the Group by ought to be
insert into table1
using (a.first, b.second)
insert into table2
using (a.first, avgSecond)
select a.first, AVG(b.second) as avgSecond
from tableA as a
inner join tableB as b
on a.first = b.first
Group by a.first