Finding group of records with a certain status
I recently was given the task to optimize some code prior to a client's upgrade to SQL Server 2005.
The objective for the old code was to get all orders where status for all orderlines where set to 'DROP'.
SELECT t.OrderID,
t.OrderDate
FROM tblOrder AS t
INNER JOIN (
SELECT a.OrderID,
CASE
WHEN SUM(b.Type) = COUNT(*) THEN 'DROP'
ELSE NULL
END AS [Type]
FROM tblOrder AS a
INNER JOIN (
SELECT OrderID,
CASE
WHEN [Status] = 'DROP' THEN 1
ELSE 0
END AS Type
FROM tblOrder
) AS b ON b.OrderID = a.OrderID
GROUP BY a.OrderID
) AS c ON c.OrderID = t.OrderID AND c.Type = 'DROP'
After some testing I come up with this alternative
SELECT w.OrderId,
w.OrderDate
FROM tblOrder AS w
INNER JOIN (
SELECT OrderID
FROM tblOrder
GROUP BY OrderID
HAVING MIN(CASE WHEN Status = 'DROP' THEN 1 ELSE 0 END) = 1
) AS d ON d.OrderID = w.OrderID
I immediately saw that this code run much faster. Then I started to think if the time spent optimizing the code was worth the time.
For testing I set up two identical SQL machines, one with SQL Server Enterprise 2005 SP2 and the other with SQL Server Enterprise 2000 SP4.
SQL Profiler was configured to measure CPU, DURATION, READS and WRITES.
These are the results I got with the old query on both boxes
SQL Server 2000 |
SQL Server 2005 |
||||||
DURATION |
CPU |
READS |
WRITES |
DURATION |
CPU |
READS |
WRITES |
1373 |
938 |
40236 |
25 |
546 |
516 |
22185 |
0 |
1310 |
922 |
40170 |
25 |
583 |
515 |
22433 |
0 |
1283 |
891 |
40170 |
25 |
548 |
485 |
22199 |
0 |
1360 |
953 |
40170 |
25 |
493 |
500 |
22210 |
0 |
1376 |
953 |
40170 |
25 |
494 |
500 |
22300 |
0 |
These are the results I got for the new query on both boxes
SQL Server 2000 |
SQL Server 2005 |
||||||
DURATION |
CPU |
READS |
WRITES |
DURATION |
CPU |
READS |
WRITES |
296 |
266 |
27322 |
25 |
185 |
156 |
18112 |
0 |
296 |
250 |
27272 |
25 |
164 |
172 |
18106 |
0 |
283 |
265 |
27272 |
25 |
167 |
172 |
18106 |
0 |
266 |
266 |
27272 |
25 |
169 |
172 |
18106 |
0 |
230 |
250 |
27272 |
25 |
168 |
156 |
18106 |
0 |
My first reaction was that the time spent optimizing the query was worth the time because the old query went from {1300, 900, 40000, 25} down to {300, 250, 27000, 25} on the SQL Server 2000 box, which is a speed improvement between 200% and 300%.
But the real surprise is that the old query without any optimization run almost as fast on the SQL Server 2005 box than the optimized query on SQL Server 2000 box.
My conclusion is that SQL Server 2005 has a better query engine and produces better execution plans than SQL Server 2000. This do not take away the fact that the optimized query is faster on both boxes.
The lesson learned here is that you can never throw in enough hardware to solve badly written code.
For completeness, I also post the execution plans for both queries
Old query
SQL Server 2000
|--Table Spool
|--Top
|--Nested Loops
|--Filter
| |--Compute Scalar
| |--Compute Scalar
| |--Hash Match
| |--Hash Match
| |--Clustered Index Scan
| |--Clustered Index Scan
|--Clustered Index Seek
SQL Server 2005
|--Compute Scalar
|--Sort
|--Top
|--Merge Join
|--Sort
| |--Filter
| |--Compute Scalar
| |--Hash Match
| |--Hash Match
| |--Clustered Index Scan
| |--Compute Scalar
| |--Clustered Index Scan
|--Clustered Index Scan
New query
SQL Server 2000
|--Table Spool
|--Top
|--Merge Join
|--Filter
| |--Stream Aggregate
| |--Clustered Index Scan
|--Clustered Index Scan
SQL Server 2005
|--Compute Scalar
|--Sort
|--Top
|--Merge Join
|--Filter
| |--Stream Aggregate
| |--Compute Scalar
| |--Clustered Index Scan
|--Clustered Index Scan
Legacy Comments
Hugo Kornelis
2007-09-20 |
re: Finding group of records with a certain status Hi Peter, Did you also try this variation? SELECT a.OrderID, a.OrderDate FROM Orders AS a WHERE NOT EXISTS (SELECT * FROM Orders AS b WHERE b.OrderID = a.OrderID AND b.Status <> 'Drop'); |
Peso
2007-09-20 |
re: Finding group of records with a certain status Thanks for the tip. I haven't tried it yet, but shouldn't it be = 'Drop' in the EXISTS case? |
Hugo Kornelis
2007-09-21 |
re: Finding group of records with a certain status "shouldn't it be = 'Drop' in the EXISTS case?" Nope. This is the old "don't use no double negations, not never!" thingie. Instead of searching for orders that have all rows at status Drop, we search for orders that have no rows at any other status other than Drop - if an order has no rows with Status <> Drop, then obviously, all statuses must be = Drop. Best, Hugo |
Peso
2007-09-21 |
re: Finding group of records with a certain status You are absolutely correct. That proves the point that you should never write code when sitting on an airplane. |
dineshrajan
2010-08-09 |
re: Finding group of records with a certain status declare @Order table ( orderid int identity(1,1) not null, ordername varchar(100) ) declare @OrderType table ( OrderTypeId int identity(1,1) not null, orderid int, Typename varchar(100) ) insert into @Order(ordername) values ('X100'),('XY120'),('ZE100') insert into @OrderType(orderid,Typename) values (1,'Pending'),(1,'Completed'),(1,'Draft') ,(2,'Completed'),(2,'Completed'),(2,'Completed') select * from @Order select * from @OrderType select O.* from @Order O join ( select distinct orderid,[StatusCount]= sum(case when Typename = 'Completed' THEN 1 ELSE 0 END),[Count]= COUNT(*) from @OrderType group by orderid ) T1 on O.orderid = T1.orderid and T1.[Count] = T1.StatusCount |