Lately I’ve been working on a data migration SSIS package for a customer and some of it is just a straight forward execute sql tasks in SSIS on some very large tables. I pull 10 million rows into a staging area, massage some data into the proper format, insert it into it’s final production table resting place, easy enough. I’m testing it on a very low powered 32 bit server and it takes a while. At one point today I started wondering if it was just locked up or was still running.
I’ve used the sys.partitions table before as an alternate method of getting row counts for a table as it already has that information handy for me. This time I was querying the table while the insert was running and I was able to see exactly what was happening. The code below will walk you through an easy example. This will work the same in 2005, or 2008.
First, just create a throw away table:
Create table Loadtest
(
IDVal int identity(1,1),
CustomerName varchar(200),
CustomerCity varchar(200)
)
Now let’s create a looping insert that gives us enough time to look at things. You can just stop it when you have had enough. It’s nothing exciting here, I’m just creating an loop 2 million times and doing an insert..like I said, stop it when you’ve had enough:
declare @counter int
set @counter=2000000
While @counter>0
BEGIN
Insert into LoadTest(CustomerName,CustomerCity)
values('Mike Femenella','Matthews')
SET @counter=@counter-1
END
While your loop is running, open up another query analyzer window to the same database and run this:
SELECT sp.index_id,Rows,si.name,si.type_desc
FROM sys.partitions sp
inner join sys.indexes si
on sp.object_id=si.object_id
AND sp.index_id=si.index_id
where object_name(sp.object_id)='LoadTest'
Since the table is not indexed you’ll see the row count and that it’s populating the heap, hit F5 a few times and you can watch the rows field increment. If you have doubts your insert statement is actually running, this is a good way to check that it is alive and you can also get an idea of it’s progress if you have a guess as to how many rows “should” be there.
--Unindexed, you see the heap being populated.
index_id Rows name type_desc
0 24252 NULL HEAP
So that was mildly interesting and sort of cool. Of course we can take it a step further (and I always love doing that). Go ahead and stop the loop when you have have a million or so rows in there, you need a large amount of data in the LoadTest table so you have enough time to run the sys.partitions query again. I was letting my loop run while writing the first part of the blog so I have 2 million rows in LoadTest.
Let’s create a mirror image of our LoadTest table and since I’m at the end of my work week and feeling very creative I’m going to call it LoadTest_Copy, go ahead and F5 and create the table.. Award winning material here isn’t it?
Create table Loadtest_Copy
(
IDVal int ,
CustomerName varchar(200),
CustomerCity varchar(200)
)
GO
Create Clustered index ixc_IDVal on LoadTest_Copy(IDVal)
Create index ix_CustomerName on LoadTest_Copy(CustomerName)
Create index ix_CustomerCity on LoadTest_Copy(CustomerCity)
GO
In another window modify your sys.partition query and change the name from LoadTest to LoadTest_Copy. Run this now and you have 3 rows, one for the clustered index and one for each of the other indexes. Somewhat more realistic and more interesting.
SELECT sp.index_id,Rows,si.name,si.type_desc
FROM sys.partitions sp
inner join sys.indexes si
on sp.object_id=si.object_id
AND sp.index_id=si.index_id
where object_name(sp.object_id)='LoadTest_Copy'
Results:
index_id Rows name type_desc
1 0 ixc_IDVal CLUSTERED
2 0 ix_CustomerName NONCLUSTERED
3 0 ix_CustomerCity NONCLUSTERED
Now, do an insert from LoadTest into LoadTest_Copy. Start it and then flip over to your sys.partition query, run it while the insert is happening and look at the results.
You can keep hitting F5 and watch the numbers change. Notice it builds the clustered index first, which of course makes sense, and then will go through each of the other 2 indexes building them out.
index_id Rows name type_desc
1 449900 ixc_IDVal CLUSTERED
2 0 ix_CustomerName NONCLUSTERED
3 0 ix_CustomerCity NONCLUSTERED
index_id Rows name type_desc
1 2000000 ixc_IDVal CLUSTERED
2 1065245 ix_CustomerName NONCLUSTERED
3 0 ix_CustomerCity NONCLUSTERED
index_id Rows name type_desc
1 2000000 ixc_IDVal CLUSTERED
2 2000000 ix_CustomerName NONCLUSTERED
3 126137 ix_CustomerCity NONCLUSTERED
And finally now that it’s done:
index_id Rows name type_desc
1 2000000 ixc_IDVal CLUSTERED
2 2000000 ix_CustomerName NONCLUSTERED
3 2000000 ix_CustomerCity NONCLUSTERED
So next time you’re wondering if your insert or update statement is actually doing anything useful, you can leverage some of the underlying DMV data to make sure things are still on track and it’s a good way to tell if you have time to run out for coffee. That’s the last SQL related thing I’m doing, time for vacation…cheers!
I'm looking forward to speaking in Atlanta on the 24th, will be fun to get back down that way to visit with some friends and present two topics that I really enjoy.
First, an introduction to performance tuning. Performance tuning is a very wide and deep topic and we're staying close to the surface. I direct this class for newbie sql users who have less than 2 years of experience. It's all the things I wish someone would have told me in my first 2 years about what to look for when the database was slow...or allegedly slow I should say. We'll cover using profiler to find slow performing queries and how to save the data off to a table as well as a tour of other features. The difference between clustered, non clustered and covering indexes. How to look at and understand an execution plan (at a high level) and finally the difference between a temp table and a table variable and what the implications are of using either one in your code. That pretty much takes up a full hour.
Second presentation, Loading Data in Real Time. It's really a presentation about partitioning but with a twist that we used at work recently to solve a need to load some data quickly and put it into production with minimal downtime. We'll cover partition functions, schemes,$partition, merge, sys.partitions and show some examples of building a set of partitioned tables and using the switch statement to move it from one table to another. Finally we'll cover the differences in partitioning between 2005 and 2008. Hope to see you there!
And if you read my blog please introduce yourself!
Very excited to announce that I’ll be holding 2 sessions at SQL Saturday in VA on April 10th. If there are any frequent readers of SQLTeam.com attending, please make sure to say hi! Topics I’m covering are partitioning & loading data real time and an introduction to performance tuning. Hope to see you there!
SQL Saturday Richmond Schedule