Peter Larsson Blog

Patron Saint of Lost Yaks

Finding group of records with a certain status

I recently was given the task to optimize some code prior to a client's upgrade to SQL Server 2005.
The objective for the old code was to get all orders where status for all orderlines where set to 'DROP'.

SELECT     t.OrderID,
           t.OrderDate
FROM       tblOrder AS t
INNER JOIN (
            SELECT     a.OrderID,
                          CASE
                           WHEN SUM(b.Type) = COUNT(*) THEN 'DROP'
                              ELSE NULL
                          END AS [Type]
               FROM       tblOrder AS a
               INNER JOIN (
                           SELECT     OrderID,
                                         CASE
                                          WHEN [Status] = 'DROP' THEN 1
                                             ELSE 0
                                         END AS Type
                              FROM       tblOrder
                          ) AS b ON b.OrderID = a.OrderID
               GROUP BY   a.OrderID
            ) AS c ON c.OrderID = t.OrderID AND c.Type = 'DROP'

After some testing I come up with this alternative

SELECT     w.OrderId,
           w.OrderDate
FROM       tblOrder AS w
INNER JOIN (
                SELECT    OrderID
               
FROM      tblOrder
               
GROUP BY  OrderID
               
HAVING    MIN(CASE WHEN Status = 'DROP' THEN 1 ELSE 0 END) = 1
           
) AS d ON d.OrderID = w.OrderID

I immediately saw that this code run much faster. Then I started to think if the time spent optimizing the code was worth the time.
For testing I set up two identical SQL machines, one with SQL Server Enterprise 2005 SP2 and the other with SQL Server Enterprise 2000 SP4.
SQL Profiler was configured to measure CPU, DURATION, READS and WRITES.

These are the results I got with the old query on both boxes

SQL Server 2000

SQL Server 2005

DURATION

CPU

READS

WRITES

DURATION

CPU

READS

WRITES

1373

938

40236

25

546

516

22185

0

1310

922

40170

25

583

515

22433

0

1283

891

40170

25

548

485

22199

0

1360

953

40170

25

493

500

22210

0

1376

953

40170

25

494

500

22300

0

These are the results I got for the new query on both boxes 

SQL Server 2000

SQL Server 2005

DURATION

CPU

READS

WRITES

DURATION

CPU

READS

WRITES

296

266

27322

25

185

156

18112

0

296

250

27272

25

164

172

18106

0

283

265

27272

25

167

172

18106

0

266

266

27272

25

169

172

18106

0

230

250

27272

25

168

156

18106

0

My first reaction was that the time spent optimizing the query was worth the time because the old query went from {1300, 900, 40000, 25} down to {300, 250, 27000, 25} on the SQL Server 2000 box, which is a speed improvement between 200% and 300%.

But the real surprise is that the old query without any optimization run almost as fast on the SQL Server 2005 box than the optimized query on SQL Server 2000 box.

My conclusion is that SQL Server 2005 has a better query engine and produces better execution plans than SQL Server 2000. This do not take away the fact that the optimized query is faster on both boxes.

The lesson learned here is that you can never throw in enough hardware to solve badly written code.

For completeness, I also post the execution plans for both queries

Old query

SQL Server 2000

       |--Table Spool
            |--Top
                 |--Nested Loops
                      |--Filter
                      |    |--Compute Scalar
                      |         |--Compute Scalar
                      |             |--Hash Match
                      |                   |--Hash Match
                      |                        |--Clustered Index Scan
                      |                        |--Clustered Index Scan
                      |--Clustered Index Seek

SQL Server 2005

       |--Compute Scalar
            |--Sort
                 |--Top
                      |--Merge Join
                           |--Sort
                           |    |--Filter
                           |         |--Compute Scalar
                           |              |--Hash Match
                           |                   |--Hash Match
                           |                        |--Clustered Index Scan
                           |                        |--Compute Scalar
                           |                             |--Clustered Index Scan
                           |--Clustered Index Scan

New query

SQL Server 2000

        |--Table Spool
            |--Top
                 |--Merge Join
                      |--Filter
                      |    |--Stream Aggregate
                      |         |--Clustered Index Scan
                      |--Clustered Index Scan

SQL Server 2005

       |--Compute Scalar
            |--Sort
                 |--Top
                      |--Merge Join
                           |--Filter
                           |    |--Stream Aggregate
                           |         |--Compute Scalar
                           |              |--Clustered Index Scan
                           |--Clustered Index Scan

Continued here
http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Finding-group-of-records-with-a-certain-status-II.aspx

Legacy Comments


Hugo Kornelis
2007-09-20
re: Finding group of records with a certain status
Hi Peter,

Did you also try this variation?

SELECT a.OrderID, a.OrderDate
FROM Orders AS a
WHERE NOT EXISTS
(SELECT *
FROM Orders AS b
WHERE b.OrderID = a.OrderID
AND b.Status <> 'Drop');

Peso
2007-09-20
re: Finding group of records with a certain status
Thanks for the tip.

I haven't tried it yet, but shouldn't it be = 'Drop' in the EXISTS case?

Hugo Kornelis
2007-09-21
re: Finding group of records with a certain status
"shouldn't it be = 'Drop' in the EXISTS case?"

Nope. This is the old "don't use no double negations, not never!" thingie. Instead of searching for orders that have all rows at status Drop, we search for orders that have no rows at any other status other than Drop - if an order has no rows with Status <> Drop, then obviously, all statuses must be = Drop.

Best, Hugo

Peso
2007-09-21
re: Finding group of records with a certain status
You are absolutely correct.
That proves the point that you should never write code when sitting on an airplane.

dineshrajan
2010-08-09
re: Finding group of records with a certain status
declare @Order table
(
orderid int identity(1,1) not null,
ordername varchar(100)
)

declare @OrderType table
(
OrderTypeId int identity(1,1) not null,
orderid int,
Typename varchar(100)
)

insert into @Order(ordername)
values ('X100'),('XY120'),('ZE100')

insert into @OrderType(orderid,Typename)
values (1,'Pending'),(1,'Completed'),(1,'Draft')
,(2,'Completed'),(2,'Completed'),(2,'Completed')

select * from @Order

select * from @OrderType




select O.*
from @Order O join
(
select distinct orderid,[StatusCount]= sum(case when Typename = 'Completed' THEN 1 ELSE 0 END),[Count]= COUNT(*)
from @OrderType
group by orderid
) T1 on O.orderid = T1.orderid and T1.[Count] = T1.StatusCount