Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 155, comments - 2679, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Thursday, June 05, 2008

The Joy of Blog Feedback

Introduction

I have been writing my little blog here for some time now, and my favorite part of doing this is of course the feedback.  It's always great to hear from the readers, to have mistakes corrected, to debate various topics and techniques, and to learn a lot about SQL and the various topics I discuss here. 

At this point, I have received over 1,700 comments over the years, and while all of them are truly appreciated, I have noticed that unfortunately many of the, uh, less helpful comments do seem to consistently fall neatly into various categories. 

Let's take a look at an example of a simple, typical blog post and some of the responses that often come back.  If you write a blog of your own, or often read the feedback from other blogs, many of these may seem familiar to you. 

A Typical Blog Post


Today, I have a simple tip for beginner SQL programmers.  When writing a SELECT, you can add a WHERE clause to filter the results that are returned.

For example, to only return rows for CustomerID 345, you can write:

SELECT ...
FROM YourTable
WHERE CustomerID = 345

As you can see, it is very simple. You can use any boolean expression to filter the results as needed. Try it out!  If you have any questions, let me know.

Some Typical Responses

The subtle blog spammer (that you initially mistake for a nice complement):

Very helpful site! Good advice!  From Joe at www.some-random-site-that-has-nothing-to-do-with-sql.com

The person who doesn't seem to get it:

Ummm ... what good does this do when I want to sort? You should fix the code.

The person that really doesn't get it:

The problem with that is it will only return results for one customer.

The person that somehow takes away the exact opposite of what you wrote:

I disagree, this will not return all customers other than 345 and this is definitely not something for advanced SQL Programmers, it's probably better for beginners.

The script kiddie (who just wants to cut and paste your code, not read or learn anything):

LOL, that doesnt even run 4 me!  I get errorz that sez "YourTable" does not exist!  Plz help!!  thnx!

The very clichéd, mindless "anti-Microsoft ranter":

You only have to use WHERE clauses because Bill Gates wants more $$, you are a shill!! Micro$oft sucks, you should use an iPhone for this!  MySQL automagically filters results for you!

The "skimmer" (who just skims the post missing most of it):

Nice, but is there any way to filter for just one customer?

The "repeater" (who just repeats what you've already written):

A better solution is to write WHERE CustomerID = 345, it works better.  It is also fast because less rows are returned.  Using WHERE is a good way to filter a SELECT.

The "know-it-all complainer":

That is the stupidest advice I ever read, why would you want to ever do this? Just use a parameter, or an ORM tool-- this will not scale!  I sure hope CustomerID isn't a VARCHAR -- then you have an implicit conversion happening, your indexes are shot, your server will overheat, and your wife will leave you for your mechanic.  Also, 345 is too large if CustomerID is a tinyint.

The random, unrelated question asker:

Good advice. Thnx.  How to insert into the table?

The "misunderstander":

If I add this to all of my scripts, only data for one customer will ever be returned.  I am not sure this is a good idea. Also, this code will not work in Java and doesn't follow the HTML 4.0 specification. 

The very rare polite and helpful typo alerter:

Hey there, you have a typo in the first sentence -- should be "filter", not "fitler"! Just letting you know, thanks for a great post!

The much more common typo alerter:

You wrote FITLER not FILTER, your an idiot!! if you cannot write English how can you write SQL ???   Learn to spell!

Summary

Please, don't misunderstand, I mean this all in good fun.  I love feedback, and please, keep it coming.  It's what makes this and every other blog a fun place to visit.

In fact, I realize that I left out the most annoying feedback of all!  That's right, the Thin-Skinned, Overly-Defensive Blog Author Who Feels the Need to Respond to Everything:

Did you even read what I wrote? I did not say that.  And, yes, I did spell "monkey" wrong, so sue me!  Remind me to fire my editor.... or maybe I should refund your subscription fee?  Oh, wait, this blog is free!  So what the heck are you complaining about?  Why don't you go bother some MySQL blogger?  I hear they usually write at a 5th grade level which is probably more appropriate for your intellect. Jerk!

Yeah, comments like those are definitely the worst of all!  Thank you for putting up with my feedback, now that I think of it!

posted @ Friday, June 06, 2008 3:43 PM | Feedback (11) | Filed Under [ Miscellaneous Humor ]

The Truth about "Cursor Busting" in SQL

Let's say you are called in to troubleshoot a stored procedure that is performing poorly.

You dive in to investigate and this is what you find:

create procedure ProcessProducts
as
    declare @Products cursor, @ProductID int
    set @Products = cursor for select ProductID from Products order by ProductID
    open @Products

    fetch next from @Products into @ProductID

    while (@@FETCH_STATUS=0)
        begin
        exec DoSomething @ProductID
        fetch next from @Products into @ProductID
        end

    deallocate @Products

Ah ha! A cursor!  It seems we have identified the bottleneck: Clearly, the performance problems are because the code is not doing things in a set-based manner, but rather by processing rows one at a time using a dreaded cursor.  This cursor is opening up the Products table, looping through the rows one at a time, and calling the "DoSomething" stored procedure for each ProductID.  As we all know, cursors are not the way to go when writing SQL code; this cursor should eliminated and replaced with a cleaner, more efficient (and more socially acceptable!) solution.

So, how we do optimize this?  Well, a commonly suggested approach is to eliminate the CURSOR by replacing it with a WHILE loop:

    declare @ProductID int
    set @ProductID = -99999

    while (@ProductID is not null)
        begin
        set @ProductID = (select top 1 ProductID
                          from Products
                          where ProductID > @ProductID
                          order by ProductID asc)

        exec DoSomething @ProductID

        end

Instead of declaring a CURSOR to loop through the table, we now are using "set-based" code and our problems seem to be solved.  The cursor is gone, our code looks much cleaner, we've tested it and it works properly, so off to production it goes.  Another cursor has been busted!

Right?

Actually ... no.

You see, eliminating cursors is not about syntax.  It is not about searching for the word "cursor" in your code and just replacing it with a WHILE loop that does the same thing.  Optimizing and replacing cursors involves much more.  We can never optimize any cursor code until we look deeper into what exactly is happening when we "process" each of those rows.  In this case, we need to find out what that "DoSomething" procedure is actually doing. 

Suppose the DoSomething procedure is generating a report and sending an email to the "Product Manager" for each product that contains status information, and then logging this email message into a table somewhere.

If that is the case, what have we just gained by replacing our CURSOR?  

Honestly -- not much,  if anything at all.  Because of the task at hand, we may very well need to process rows in the Product table one-by-one to send our emails and generate the report, and the bottleneck here is not the cursor code at all, but rather the report generation and maybe sending the email.   Eliminating the cursor code probably gains us nothing here.  If you need to process rows one at a time, go ahead and use a cursor -- that's what they are there for!   Replacing a perfectly fine, simple cursor with a WHILE loop might even make your code longer, or more confusing, or even less efficient depending on circumstances. 

For example, what if we need to process the Products ordered by Region, then Product Name, for whatever reason.  Our cursor code is simple:

set @Products = cursor for
    select ProductID
    from Products
    order by Region, ProductName

All that we needed  to change was our ORDER BY clause.  Now, how would we write this as a WHILE loop?  Is it possible?  Sure.  Will it be as simple and clean as using a cursor?  No, it won't. (Though ROW_COUNT() makes this much easier than it used to be)

Now, I am not here to say that cursors are "good", but if you really need to process rows one by one, go ahead and proudly use a cursor.   Replacing cursors isn't about processing rows one-by-one in a different way (i.e., using a WHILE loop instead), it is about not processing rows one-by-one at all!   

Let's consider another scenario: What if the DoSomething stored procedure is checking to see if the Product's ExpireDate is greater than today's date, and if so, it is updates the Status column for that Product to 'X'.

In that situation, what have we gained by rewriting ProcessProducts without a cursor, and using a WHILE loop instead?   The answer is, once again: nothing!  In fact, we potentially have once again made our code more confusing or even less efficient than a cursor might be!  Remember, the bottleneck isn't the cursor syntax -- it is the fact that we are processing rows one at a time.  Replacing the cursor with the WHILE loop didn't solve this problem, did it?  

So, looking now at both of the scenarios I presented for the DoSomething stored procedure, it should be clear that we did not fix anything by replacing the cursor in either case simply by writing a WHILE loop.  If that's all you are doing, don't bother replacing the cursor at all.  You haven't optimized anything.

As I said before, the art of replacing a cursor is not a find-and-replace syntax change operation -- it is a fundamental change in how you process your data.  As in the Product report generation and email example, it may be that we simply need to process rows one by one, and thus no further optimization is possible from a SQL point of view.  In situations like updating the Product table, however, we do not need to process the rows individually -- we can do everything in one single UPDATE statement.  Thus, in order to determine how to optimize the ProcessProducts stored procedure, we needed to dig deeper into entire process as a whole, which included examining the DoSomething stored procedure and determining the full scope of exactly what this "ProcessProducts" stored procedure is doing. 

So, if "DoSomething" is updating the Products table as specified, we now know that a good replacement for our cursor code doesn't result in a WHILE loop and calling a separate stored procedure over and over at all -- it results a true, set-based solution:

create procedure ProcessProducts
as
    Update Products set Status='X' where ExpireDate > getdate()
   
And THAT is how you optimize a cursor! No loops, no calling of another stored procedure for each row in a table, no "find-and-replace" cursor code removal.  We examined the entire process, and rewrote the entire process, to get it done quicker and shorter and faster without cursors or loops. 

Always remember: Replacing a cursor isn't about rewriting your syntax, it is about redesigning your algorithm.

posted @ Thursday, June 05, 2008 10:56 AM | Feedback (14) | Filed Under [ T-SQL Efficiency ]

Powered by:
Powered By Subtext Powered By ASP.NET