posts - 10, comments - 49, trackbacks - 0

Thursday, May 29, 2008

Fast T-SQL Row Number Generator

You can expect:

This is a medium intensity post (you should not get a migraine, but I make no promises).

I Expect (that you have the following experience):

  • T-SQL Stored Proc experience
  • Windowed Function experience
  • Basic Join understanding

Row Number Generation

There are a number of ways to generate row numbers in T-SQL.  One way is to create a temp table with an identity column and do inserts into that table.  Typically you do it in a while loop with a counter.  I've used this method in the past and it works ok.  I was helping out a friend who was trying to help out a co-worker with this very problem.  I started playing around with other ways to generate row numbers / IDs and came up with what I think is a rather novel way of doing it.  There are several pieces to this and I'll cover each of them.

Windowed Functions

SQL Server 2005 introduced some "in-place aggregate" functions.  These allow you to get results you used to have to use a Group By to get, without the Group By.  They are helpful in situations where you need extra data that you just can't keep in your Select when using a Group By.  One very useful one is the Row_Number() function.  It has tremendous usefulness and I'm only using it in a very basic manner.

Fast Data Generation

I needed a way to quickly generate data.  SQL Server is meant for set based processing and so things like looping and UDF are not very speedy.  You should all know this and if you don't, you need to read more!  On a side rant, I've seen developers treat T-SQL like it was VB (I say VB because the people who do this are typically green developers).  T-SQL is NOT a programming language!  It is a Database language!

I decided to use SQL's set power for my purposes.  For those that need a refresher, a Full Cross Join will produce every possible combination of data from 2 or more tables.  My method was to create an in-memory table with 100 rows with 1 column.  That column contained the numbers 1 through 100.  A Full Cross Join of that table to itself would result in a Select with 100 * 100 = 10,000 records.  The Select is lightning fast compared to looping 10k times.  Adding 2 additional Full Joins results in the possibility of creating 100m records.  If you wanted to do more, you could add more Full Cross Joins.

Execution Plan Optimization

If I only want 5,000 row numbers, how do I limit my Full Cross Joins to only produce that many?  I could try to only insert the correct amount of records into my temp table so that the result is 5k.  This, however, would require computing the exact number, plus I wouldn't be able to generate any possible number I wanted.  I would not, for instance, be able to generate 58,391 row numbers.  I again turn to the power of SQL Server's set powers.  I wrapped the Full Cross Join Select inside another Select that had a Where to limit my results.  One might think that the SubSelect would have to finish before the outer Select would be run, but that is not the case.  The Optimizer looks at the execution plan and sees that I only want 58,391 rows and stops processing the SubSelect once that has been reached.  Therefore, I don't generate 100m rows and then only return 58,391, I only generate 58,391 rows.

The Query

Create Procedure GenerateRowNumbers(@NumberOfRows int, @StartNumber int)
As Begin
Declare
@NumGen Table (Num int)
Declare @cnt int
Set
@cnt = 1

While @cnt <= 100 Begin
Insert Into
@NumGen
Select @cnt
Set @cnt = @cnt + 1
End

Select
@StartNumber + RowNum
From
(
Select Row_Number() Over (Order By N1.Num) As RowNum
From @NumGen N1, @NumGen N2, @NumGen N3, @NumGen N4
) RowNums
Where RowNum <= @NumberOfRows

End

 

Conclusion

This method consumes much less memory than filling a temp table using a while loop.  It also consumes less CPU cycles.  This method is very fast and generates about 100k row numbers per second.  It handles the offset so you can start at any point and generate any number of row numbers.  It does have a 100m limit, but that can be overcome by adding another Full Cross Join.

-Madman Out-

posted @ Thursday, May 29, 2008 11:16 AM | Feedback (9) | Filed Under [ SQL Server Madness ]

Powered by:
Powered By Subtext Powered By ASP.NET