Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

The Truth about "Cursor Busting" in SQL

Let's say you are called in to troubleshoot a stored procedure that is performing poorly.

You dive in to investigate and this is what you find:

create procedure ProcessProducts
as
    declare @Products cursor, @ProductID int
    set @Products = cursor for select ProductID from Products order by ProductID
    open @Products

    fetch next from @Products into @ProductID

    while (@@FETCH_STATUS=0)
        begin
        exec DoSomething @ProductID
        fetch next from @Products into @ProductID
        end

    deallocate @Products

Ah ha! A cursor!  It seems we have identified the bottleneck: Clearly, the performance problems are because the code is not doing things in a set-based manner, but rather by processing rows one at a time using a dreaded cursor.  This cursor is opening up the Products table, looping through the rows one at a time, and calling the "DoSomething" stored procedure for each ProductID.  As we all know, cursors are not the way to go when writing SQL code; this cursor should eliminated and replaced with a cleaner, more efficient (and more socially acceptable!) solution.

So, how we do optimize this?  Well, a commonly suggested approach is to eliminate the CURSOR by replacing it with a WHILE loop:

    declare @ProductID int
    set @ProductID = -99999

    while (@ProductID is not null)
        begin
        set @ProductID = (select top 1 ProductID
                          from Products
                          where ProductID > @ProductID
                          order by ProductID asc)

        exec DoSomething @ProductID

        end

Instead of declaring a CURSOR to loop through the table, we now are using "set-based" code and our problems seem to be solved.  The cursor is gone, our code looks much cleaner, we've tested it and it works properly, so off to production it goes.  Another cursor has been busted!

Right?

Actually ... no.

You see, eliminating cursors is not about syntax.  It is not about searching for the word "cursor" in your code and just replacing it with a WHILE loop that does the same thing.  Optimizing and replacing cursors involves much more.  We can never optimize any cursor code until we look deeper into what exactly is happening when we "process" each of those rows.  In this case, we need to find out what that "DoSomething" procedure is actually doing. 

Suppose the DoSomething procedure is generating a report and sending an email to the "Product Manager" for each product that contains status information, and then logging this email message into a table somewhere.

If that is the case, what have we just gained by replacing our CURSOR?  

Honestly -- not much,  if anything at all.  Because of the task at hand, we may very well need to process rows in the Product table one-by-one to send our emails and generate the report, and the bottleneck here is not the cursor code at all, but rather the report generation and maybe sending the email.   Eliminating the cursor code probably gains us nothing here.  If you need to process rows one at a time, go ahead and use a cursor -- that's what they are there for!   Replacing a perfectly fine, simple cursor with a WHILE loop might even make your code longer, or more confusing, or even less efficient depending on circumstances. 

For example, what if we need to process the Products ordered by Region, then Product Name, for whatever reason.  Our cursor code is simple:

set @Products = cursor for
    select ProductID
    from Products
    order by Region, ProductName

All that we needed  to change was our ORDER BY clause.  Now, how would we write this as a WHILE loop?  Is it possible?  Sure.  Will it be as simple and clean as using a cursor?  No, it won't. (Though ROW_COUNT() makes this much easier than it used to be)

Now, I am not here to say that cursors are "good", but if you really need to process rows one by one, go ahead and proudly use a cursor.   Replacing cursors isn't about processing rows one-by-one in a different way (i.e., using a WHILE loop instead), it is about not processing rows one-by-one at all!   

Let's consider another scenario: What if the DoSomething stored procedure is checking to see if the Product's ExpireDate is greater than today's date, and if so, it is updates the Status column for that Product to 'X'.

In that situation, what have we gained by rewriting ProcessProducts without a cursor, and using a WHILE loop instead?   The answer is, once again: nothing!  In fact, we potentially have once again made our code more confusing or even less efficient than a cursor might be!  Remember, the bottleneck isn't the cursor syntax -- it is the fact that we are processing rows one at a time.  Replacing the cursor with the WHILE loop didn't solve this problem, did it?  

So, looking now at both of the scenarios I presented for the DoSomething stored procedure, it should be clear that we did not fix anything by replacing the cursor in either case simply by writing a WHILE loop.  If that's all you are doing, don't bother replacing the cursor at all.  You haven't optimized anything.

As I said before, the art of replacing a cursor is not a find-and-replace syntax change operation -- it is a fundamental change in how you process your data.  As in the Product report generation and email example, it may be that we simply need to process rows one by one, and thus no further optimization is possible from a SQL point of view.  In situations like updating the Product table, however, we do not need to process the rows individually -- we can do everything in one single UPDATE statement.  Thus, in order to determine how to optimize the ProcessProducts stored procedure, we needed to dig deeper into entire process as a whole, which included examining the DoSomething stored procedure and determining the full scope of exactly what this "ProcessProducts" stored procedure is doing. 

So, if "DoSomething" is updating the Products table as specified, we now know that a good replacement for our cursor code doesn't result in a WHILE loop and calling a separate stored procedure over and over at all -- it results a true, set-based solution:

create procedure ProcessProducts
as
    Update Products set Status='X' where ExpireDate > getdate()
   
And THAT is how you optimize a cursor! No loops, no calling of another stored procedure for each row in a table, no "find-and-replace" cursor code removal.  We examined the entire process, and rewrote the entire process, to get it done quicker and shorter and faster without cursors or loops. 

Always remember: Replacing a cursor isn't about rewriting your syntax, it is about redesigning your algorithm.

Print | posted on Thursday, June 05, 2008 10:56 AM | Filed Under [ T-SQL Efficiency ]

Feedback

Gravatar

# re: The Truth about "Cursor Busting" in SQL

nIce. Thanks, Jeff!
6/5/2008 2:24 PM | jeff maass
Gravatar

# re: The Truth about "Cursor Busting" in SQL

I couldn't agree more. So many people don't understand why cursors should be avoided, but just have a knee jerk reaction that they are automatically bad. You've explained the truth very well here.
6/6/2008 9:25 AM | Ian Gibson
Gravatar

# re: The Truth about "Cursor Busting" in SQL

What a great post! I've seen over and over how bad cursors are but your post really clears things up as to when it's OK to use them.
6/6/2008 9:58 AM | Zack Jones
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Yes, but the way this cursor is replaced is not the best way. One of the biggest problems with Cursors is that the source table needs to be queried each time it fetches the next value. In the exampole above, probably not a big deal if you're only looking at one table, but most queries/cursors are more complex than that. One big advantage of a while loop is to dump the values into either a temp table or a table variable and then use that for your looping. This point is overlooked in this article.
6/6/2008 11:09 AM | Mackin
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Mackin -- if dump those values into a temp table, that's great. And guess what? You can then use a CURSOR to iterate through the temp table just fine, right? No need for a WHILE loop. Again, what does the WHILE loop give you that a cursor doesn't if you indeed process each row one by one? Nothing. The solution is not to replace cursors with while loops, it is to rewrite your code.

6/6/2008 11:27 AM | Jeff
Gravatar

# re: The Truth about "Cursor Busting" in SQL

This is a very simple example of doing a cursor. But, as Mackin said, you get very complex queries where the best option is to use a cursor. I don't understand what the problem is with cursors. I will give you an example of a cursor query in our database. The select in the cursor reads three to four different tables with joins. the number of rows ranges between 2000 and 3000. As the rows are read inserts is done into other +- 5 tables and a whole accounting procedure happens. This query runs once a month to print cheques for a client. I would like to see that a temporary table will do it better than a cursor. I agree that with simple queries like Jeff's a cursor is way overkill. I would appreciate it if someone can explain another much better way but I don't think so.
6/6/2008 2:46 PM | Manie Verster
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Manie -- I think you should re-read my blog post very carefully. My entire point is that:

a) sometimes cursors are fine, if you truly need to process things row by row

and

b) Determining if you really need to process things row-by-row is to key to understanding if you can and/or should replace a cursor; it is not just for the sake of altering syntax or removing cursors because they aren't "cool."

In your example, it sounds like it can easily be re-written without a cursor. But, as I said in my article, we have no way of knowing without examining the big picture and understanding exactly what your code is doing.

I'd write more in my comment here, but I would just be further repeating what I already wrote in the article, so maybe just try reading it again instead?

You also should probably re-read Mackin's comment carefully because he also wrote the opposite of what you seemed to think he wrote.
6/6/2008 3:04 PM | Jeff
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Jeff, I think what you are sayng is correct because if a process has to deal with records one at a time, using a while loop or a cursor will still result in one-by-one processing, and possibly the same cost. In that case, as you stated, just replacing a cursor with a while loop will not yield any benefit. I have realised of late that people tend to condem cursor use just arbitrarily without looking deep into the underlining processes, which is awkward to me. I went to an interview one time and the interviewer insisted it is never good to use a cursor without exception, and I said it depends. I lost the job because I disagreed with him. Clearing things the way you did in your post will clear the misconception some people have about cursor usage. Thank you.
6/7/2008 10:47 AM | Addai
Gravatar

# re: The Truth about "Cursor Busting" in SQL

I'm more of a power user than a dba, but I do work with very large data sets. Isn't one of the differences between a cursor and a while loop that the cursor will store the entire result set in memry while the list is traversed while the while loop only keeps the current member of the list in memory?
6/8/2008 9:49 AM | Jim Jaggers
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Hi Jim --

It depends on the type of cursor that you use. In fact, some cursors automatically create a temp table for you to traverse the data, you don't even need to do it manually, as you would with the WHILE-SELECT rewrite.

see:

http://msdn.microsoft.com/en-us/library/ms180169.aspx

for some of the options available. Depending on what you need, you should be able to declare a very efficient cursor. In fact, in my article, I should have mentioned that if indeed row-by-row processing is required, one thing you can do to optimize is to simply check the cursor declaration to be sure that you have selected READONLY and FORWARDONLY and all that, depending on what you need.

To be honest, I am not a pro when it comes to cursors! I've yet to really use one. If I need to do row-by-row processing to do things like generate reports or send emails, I write it as a client application in .NET, not in T-SQL.

Of course, even when you write your app in .NET and use a DataReader to loop through the rows, guess what happens behind the scenes? That's right -- a CURSOR is used! :)
6/10/2008 9:23 AM | Jeff
Gravatar

# re: The Truth about "Cursor Busting" in SQL

cant i just insert this into a temp table with rownumber column and do row_number() over (partition by region order by productname) for getting the row number and then looping thro for each rownumber? also probably reset rownum variable in loop when region changes? i find that temp tables and looping thro row numbers always gives better performance than cursors.
8/31/2008 7:32 PM | anil
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Hi,
I read the articles as well as comments, but none of you not talked about the locks when the cursors are running? my questions is, is there any difference in the LOCK when we use cursor OR While loop.

With Best Regards

Aravamudan
1/12/2009 4:16 AM | Aravamudan
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Hi Jeff

well done. Using SQL Server 2008 and the RowNumber() function you can create an identical script to replace cursors on tables that don't have integer key columns, one exemple:

use AdventureWorks2008

declare @Rowid int , @RegionName varchar(100), @RegionCode varchar(4)

set @Rowid = -99999

WHILE(@Rowid is not null)
BEGIN
select top 1 @Rowid=A.rowId,@RegionCode=CountryRegionCode,@RegionName=A.Name from
(select ROW_NUMBER() over(order by CountryRegionCode) as rowId,CountryRegionCode,Name from Person.CountryRegion) as A
where A.rowId> @Rowid order by A.rowid asc

if @@ROWCOUNT<>0
-- Do something with the variables
select @Rowid,@RegionCode,@RegionName
else
set @Rowid =null
END



Best REgards, Luís
1/27/2010 9:39 AM | Luís Quedas
Gravatar

# re: The Truth about "Cursor Busting" in SQL

Luis -- I appreciate your comment, but you seem to be missing the point of the post. If you are going to write code like that, go ahead and just use a cursor !
1/27/2010 9:41 AM | Jeff
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET