SQL Server 2005: Specifying Partitions for Aggregate Functions
Did you know that a new feature in SQL Server 2005 allows you to specify an OVER partition for aggregate functions in your SELECT statement?
For example:
select
customerID, productID, orderDate, orderAmount,
sum(orderAmount) OVER (Partition by CustomerID) as CustTotal
from
Orders
customerID productID orderDate orderAmount CustTotal
----------- ----------- ----------------------- ------------- ---------
1 1 2007-01-01 00:00:00.000 20.00 91.00
1 2 2007-01-02 00:00:00.000 30.00 91.00
1 2 2007-01-05 00:00:00.000 23.00 91.00
1 3 2007-01-04 00:00:00.000 18.00 91.00
2 1 2007-01-03 00:00:00.000 74.00 118.00
2 1 2007-01-06 00:00:00.000 34.00 118.00
2 2 2007-01-08 00:00:00.000 10.00 118.00
customerID, productID, orderDate, orderAmount,
sum(orderAmount) OVER (Partition by CustomerID) as CustTotal
from
Orders
customerID productID orderDate orderAmount CustTotal
----------- ----------- ----------------------- ------------- ---------
1 1 2007-01-01 00:00:00.000 20.00 91.00
1 2 2007-01-02 00:00:00.000 30.00 91.00
1 2 2007-01-05 00:00:00.000 23.00 91.00
1 3 2007-01-04 00:00:00.000 18.00 91.00
2 1 2007-01-03 00:00:00.000 74.00 118.00
2 1 2007-01-06 00:00:00.000 34.00 118.00
2 2 2007-01-08 00:00:00.000 10.00 118.00
(7 row(s) affected)
Notice that the Total column contains the total order amount for the customer.
For more on this and some ideas on how to use this feature, check out my latest SQLTeam article.
One use for this that I didn't mention in my article is that this can be useful for sorting. For example, suppose we need a report that shows all customers and their order details, but sorted so that customers with the highest order amounts are shown first. By using a partitioned aggregate function (along with a common table expression or a derived table), this becomes very easy:
with CustOrders as
(
select
customerID, productID, orderDate, orderAmount,
sum(orderAmount) OVER (Partition by CustomerID) as CustTotal
from
Orders
)
select * from CustOrders order by CustTotal DESC
(
select
customerID, productID, orderDate, orderAmount,
sum(orderAmount) OVER (Partition by CustomerID) as CustTotal
from
Orders
)
select * from CustOrders order by CustTotal DESC
Without using this new feature, a join back to the Orders table or a correlated subquery to it would be required. I don't think you get any performance gains from using this technique, but it is easier and shorter to write.
Article link.
see also:
- SQL Server 2005: CROSS APPLY
- Taking a look at CROSS APPLY
- SQL Server 2005: Specifying Partitions for Aggregate Functions
- SQL Server 2005: Using EXCEPT and INTERSECT to compare tables
- Regular Expression Replace in SQL 2005 (via the CLR)
- More with SQL Server 2005 : Top n Per Group, Paging, and Common Table Expressions
- SQL Server 2005: Using PARTITION and RANK in your criteria
- Lots of great SQL Server 2005 Express Downloads
Legacy Comments
Ankith
2007-05-22 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions Hi Smith This is a Very good article. Great techinique to rewrite without using a dervied table. I have one question. can this be extended to find the maximum value in a range. for example if i have three dates 2007-01-01,2007-02-01,2007-05-01 can i use the over clause to get the maximum date for all the date ranges just like the sum for the custorders. I tried the techinique but could not get it to work with the max dates. Is there a limitation or am i doing something wrong? Thanks Ankith |
Jeff
2007-05-22 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions Ankith -- post some more details and I'll try to help. It's hard to guess what the problem might be without any information. Also, be sure to read the article at sqlteam if you haven't already. using MIN() or MAX() is exactly the same as using SUM() with partitions. |
Ankith
2007-05-23 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions Hi Jeff, Please find below the DDL and the expected result set. CREATE TABLE [MyDates] ( MyDatesID INT IDENTITY( 1, 1 ) ,MyDate1 SMALLDATETIME NULL ,MyDate2 SMALLDATETIME NULL ,MyDate3 SMALLDATETIME NULL ) GO INSERT INTO [MyDates] (MyDate1, MyDate2, MyDate3 ) SELECT '2007-01-01', '2007-03-31', '2007-05-01' UNION ALL SELECT '2007-03-31', '2007-05-01', '2007-01-01' UNION ALL SELECT '2007-05-01', '2007-01-01', '2007-03-31' GO I am trying to get the resultset as -- MyDatesID MyDate1 MyDate2 MyDate3 MaxDate -- ----------- ----------------------- ----------------------- ----------------------- ----------------------- -- 1 2007-01-01 00:00:00 2007-03-31 00:00:00 2007-05-01 00:00:00 2007-05-01 00:00:00 -- 2 2007-03-31 00:00:00 2007-05-01 00:00:00 2007-01-01 00:00:00 2007-05-01 00:00:00 -- 3 2007-05-01 00:00:00 2007-01-01 00:00:00 2007-03-31 00:00:00 2007-05-01 00:00:000 I tried to use MAX in place of sum but ended up totally with a different resultset. Any insight is appreciated. Thanks Ankith |
Jeff
2007-05-23 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions I am not sure I follow what you are trying to do. First off, what is MaxDate? The max dateof any of those columns? I strongly recommend normalizing your database if it's not too late, that looks like it is probably a bad design that, like usual, makes your SQL overcomplicated. Also, I don't see any partition -- do you want the MAX() of any of those 3 columns for the entire resultset? Or is there a partition that you are not showing in your sample data? or is it simply the "max" of those 3 date columns for each row in your data? It is very, very unclear. If it is just the max of those 3 columns for each row, there's no partitions or aggregate functions to use at all. Because of the unnormalized design, you'd need to use case: select id, date1,date2,date3, case when date1 > date2 and date1 > date3 then date1 when date2>date3 then date2 else date3 end as maxdate from yourtable A normalized design, with 1 date per row per ID in a related table, let's you simply use max: select id, max(date) from yourtable You might be better off visiting the SQLTeam forums and posting your question there, but I will warn you that will you definitely have to be much more clear in your requirements. |
Ankith
2007-05-23 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions Sorry for not being clear. yes it is the max of the three columns and I got the answer that I was looking from what you said. Also its not a part of my design and if it was I agree with you, that it would be highly unnormalized design . I just wanted to see it if it can be achieved using partitions and I think I got the answer. I just created a hypothetical table for you to make it a little clear and I think I have confused a little bit further.:) Thanks again |
Rahul
2007-10-31 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions you said this simple query will work select id, max(date) from yourtable i can only retrieing max value but not with its id. i wanted output to have max value and for which id it contains. the error msg coming is Msg 8120, Level 16, State 1, Line 1 Column 'employee.empId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. pls help thnx |
Jeff
2007-10-31 |
re: SQL Server 2005: Specifying Partitions for Aggregate Functions Rahul -- sorry, forgot the group by at the end: select id, max(date) from yourtable group by id |