Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

SQL Server 2005: Specifying Partitions for Aggregate Functions

Did you know that a new feature in SQL Server 2005 allows you to specify an OVER partition for aggregate functions in your SELECT statement? 

For example:

select
  customerID, productID, orderDate, orderAmount,

  sum(orderAmount) OVER (Partition by CustomerID) as CustTotal
from
  Orders


customerID  productID   orderDate                orderAmount   CustTotal
----------- ----------- -----------------------  ------------- ---------
1           1           2007-01-01 00:00:00.000         20.00     91.00
1           2           2007-01-02 00:00:00.000         30.00     91.00
1           2           2007-01-05 00:00:00.000         23.00     91.00
1           3           2007-01-04 00:00:00.000         18.00     91.00
2           1           2007-01-03 00:00:00.000         74.00    118.00
2           1           2007-01-06 00:00:00.000         34.00    118.00
2           2           2007-01-08 00:00:00.000         10.00    118.00
(7 row(s) affected)

Notice that the Total column contains the total order amount for the customer.

For more on this and some ideas on how to use this feature, check out my latest SQLTeam article.

One use for this that I didn't mention in my article is that this can be useful for sorting.   For example, suppose we need a report that shows all customers and their order details, but sorted so that customers with the highest order amounts are shown first.  By using a partitioned aggregate function (along with a common table expression or a derived table), this becomes very easy:

with CustOrders as
(
     select
    customerID, productID, orderDate, orderAmount,
    sum(orderAmount) OVER (Partition by CustomerID) as CustTotal
     from
    Orders
)
select * from CustOrders order by CustTotal DESC


Without using this new feature, a join back to the Orders table or a correlated subquery to it would be required.  I don't think you get any performance gains from using this technique, but it is easier and shorter to write.

Article link.

see also:

Legacy Comments


Ankith
2007-05-22
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
Hi Smith

This is a Very good article. Great techinique to rewrite without using a dervied table. I have one question. can this be extended to find the maximum value in a range. for example if i have three dates 2007-01-01,2007-02-01,2007-05-01 can i use the over clause to get the maximum date for all the date ranges just like the sum for the custorders. I tried the techinique but could not get it to work with the max dates. Is there a limitation or am i doing something wrong?

Thanks

Ankith

Jeff
2007-05-22
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
Ankith -- post some more details and I'll try to help. It's hard to guess what the problem might be without any information. Also, be sure to read the article at sqlteam if you haven't already. using MIN() or MAX() is exactly the same as using SUM() with partitions.

Ankith
2007-05-23
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
Hi Jeff,

Please find below the DDL and the expected result set.

CREATE TABLE [MyDates]
( MyDatesID INT IDENTITY( 1, 1 )
,MyDate1 SMALLDATETIME NULL
,MyDate2 SMALLDATETIME NULL
,MyDate3 SMALLDATETIME NULL )
GO

INSERT INTO [MyDates] (MyDate1, MyDate2, MyDate3 )
SELECT '2007-01-01', '2007-03-31', '2007-05-01'
UNION ALL
SELECT '2007-03-31', '2007-05-01', '2007-01-01'
UNION ALL
SELECT '2007-05-01', '2007-01-01', '2007-03-31'
GO

I am trying to get the resultset as

-- MyDatesID MyDate1 MyDate2 MyDate3 MaxDate
-- ----------- ----------------------- ----------------------- ----------------------- -----------------------
-- 1 2007-01-01 00:00:00 2007-03-31 00:00:00 2007-05-01 00:00:00 2007-05-01 00:00:00
-- 2 2007-03-31 00:00:00 2007-05-01 00:00:00 2007-01-01 00:00:00 2007-05-01 00:00:00
-- 3 2007-05-01 00:00:00 2007-01-01 00:00:00 2007-03-31 00:00:00 2007-05-01 00:00:000

I tried to use MAX in place of sum but ended up totally with a different resultset. Any insight is appreciated.

Thanks

Ankith


Jeff
2007-05-23
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
I am not sure I follow what you are trying to do. First off, what is MaxDate? The max dateof any of those columns? I strongly recommend normalizing your database if it's not too late, that looks like it is probably a bad design that, like usual, makes your SQL overcomplicated.

Also, I don't see any partition -- do you want the MAX() of any of those 3 columns for the entire resultset? Or is there a partition that you are not showing in your sample data? or is it simply the "max" of those 3 date columns for each row in your data? It is very, very unclear.

If it is just the max of those 3 columns for each row, there's no partitions or aggregate functions to use at all. Because of the unnormalized design, you'd need to use case:

select id, date1,date2,date3, case when date1 > date2 and date1 > date3 then date1 when date2>date3 then date2 else date3 end as maxdate
from yourtable

A normalized design, with 1 date per row per ID in a related table, let's you simply use max:

select id, max(date)
from yourtable

You might be better off visiting the SQLTeam forums and posting your question there, but I will warn you that will you definitely have to be much more clear in your requirements.

Ankith
2007-05-23
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
Sorry for not being clear. yes it is the max of the three columns and I got the answer that I was looking from what you said. Also its not a part of my design and if it was I agree with you, that it would be highly unnormalized design . I just wanted to see it if it can be achieved using partitions and I think I got the answer. I just created a hypothetical table for you to make it a little clear and I think I have confused a little bit further.:)

Thanks again

Rahul
2007-10-31
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
you said this simple query will work

select id, max(date)
from yourtable

i can only retrieing max value but not with its id. i wanted output to have max value and for which id it contains.

the error msg coming is

Msg 8120, Level 16, State 1, Line 1
Column 'employee.empId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

pls help
thnx

Jeff
2007-10-31
re: SQL Server 2005: Specifying Partitions for Aggregate Functions
Rahul -- sorry, forgot the group by at the end:

select id, max(date)
from yourtable
group by id