Refactoring for Performance
For the past few days, I've been working on a stored procedure that was the top offender for CPU usage. I tried various index changes, played around with index hints, etc. Everything I tried either made very little improvement or no improvement.
I was ready to throw in the towel, but then I decided to give the code one last look. I don't have the business knowledge for this particular system, so I had been avoiding the complex stored procedure that has over 200 lines of code. One of the queries has 8 joins, and 2 of those are to views that also contain joins. My head about exploded when I first looked at it.
But then I took a step back and narrowed down where the performance problem was. It was a view that was using SUM, MIN, MAX and ROW_NUMBER functions. It was performing those functions on a large data set, and then the stored procedure filtered it using the input parameters.
My code change was to remove the functions from the view and instead calculate those in the stored procedure.
This is a very simple example:
Original
CREATE VIEW View1 AS SELECT SUM(Column3) OVER(PARTITION BY Column6, Column7) AS SumColumn3, ROW_NUMBER() OVER(PARTITION BY Column6, Column7) AS ViewRowNumber, MIN(Column4) OVER(PARTITION BY Column8) AS MinColumn4, MAX(Column4) OVER(PARTITION BY Column8) AS MaxColumn4 FROM Table1 WHERE Column5 = 'SomeValue' AND Column9 = 0CREATE PROC Proc1 (@var1 varchar(10), @var2 varchar(10)) AS SELECT SumColumn3, ViewRowNumber, MinColumn4, MaxColumn4 FROM View1 WHERE Column1 = @var1 AND Column2 = @var2
Refactored
CREATE VIEW View1 AS SELECT Column3, Column4, Column6, Column7, Column8 FROM Table1 WHERE Column5 = 'SomeValue' AND Column9 = 0CREATE PROC Proc1 (@var1 varchar(10), @var2 varchar(10)) AS SELECT SUM(Column3) OVER(PARTITION BY Column6, Column7) AS SumColumn3, ROW_NUMBER() OVER(PARTITION BY Column6, Column7) AS ViewRowNumber, MIN(Column4) OVER(PARTITION BY Column8) AS MinColumn4, MAX(Column4) OVER(PARTITION BY Column8) AS MaxColumn4 FROM View1 WHERE Column1 = @var1 AND Column2 = @var2
Here are my actual results:
Duration (ms) | Reads | CPU | |
Original | 1009 | 430127 | 2730 |
Refactored | 24 | 523 | 16 |
BOOM!
Legacy Comments
Stewart
2014-10-07 |
re: Refactoring for Performance Ahhh the thrill of finding the solution, great job! Completely agree on your solution, the more the abstraction, the easier it is for query analyzer to get confused. I do have a question though, why query through the view? Couldn't you just query directly against the source tables? Or is the view an indexed view and thus the same performance? |
Tara Kizer
2014-10-07 |
re: Refactoring for Performance Stewart, yes certainly could query the source tables directly, but there's business rules in the view (the WHERE clause). The view is not an indexed view. I actually did not change the view since it is used by several stored procedures and my scope was this specific one, I just created a new view with the WHERE clause logic and will add a comment to the original view that both need to be updated if there's ever a change in the business rules. |
Stewart
2014-10-07 |
re: Refactoring for Performance Cool - thanks for the explanation and from your performance gain it is very nice and illustrates the main point on putting the functions out of the view which I hadn't run into before which is what has me excited to look back at some of my systems. 24ms is so much better than 1s and performance isn't likely to get much better (8ms is around the best my queries get on my systems). If you need to get down further and highly depending on data volumes (aka how much data isn't in the view and how much is in view), you might want to try a filtered index which has helped me in some of my views where the view was a small part of overall and indexes weren't optimal. CREATE NONCLUSTERED INDEX Table1_FIX ON Table1 (Column1, Column2) WHERE Column5 = 'SomeValue' AND Column9 = 0 BTW - Great to see you posting again. I've missed seeing your posts they have really helped me over the years. |
STARLORDE
2014-11-19 |
re: Refactoring for Performance "I actually did not change the view since it is used by several stored procedures and my scope was this specific one, I just created a new view with the WHERE clause logic and will add a comment to the original view that both need to be updated if there's ever a change in the business rules." Why not keep the business logic in the new view (the one returning the simple filtered list), and then update the old view so that it queries from the new view and applies the aggregate functions in that select. That way the layers document themselves and no business logic is re-used. Is there a performance implication with nested views, or best practice that I'm not aware of? |
Tara
2014-11-20 |
re: Refactoring for Performance STARLORDE, that is a good question. I didn't think of that. I am sure it'll have equivalent performance to the refactored one I did and has the added benefit of keeping the logic in the views and not in the stored procedure. Great suggestion! |