You can expect:
This is a medium intensity post (you should not get a migraine, but I make no promises).
I Expect (that you have the following experience):
- Excel experience
- Complex SQL queries
Performance Reporting for SSIS Packages.
Part 1 helped us set up the logging for SSIS so that we could retrieve the data we need to make the pretty charts that help us in tuning our packages. This information is VITAL for engaging in REAL performance tuning. You need PROOF and not anecdotes about how X is faster than Y. Many times in my career, I have had people try to convince me that their way is better or faster. I used to try to argue about what was faster/better, but I soon learned the foolishness of this. Hard evidence is the only way to make a determination on performance. That is why this framework is so important. It will bring you up to speed on how to get the best performance out of SSIS faster than going from site to site for tips. It will allow you to do tests on your own in a very efficient manner.
Of note about the SQL in Part 1. That SQL is meant to be used WHILE THE PACKAGE IS RUNNING. It will let you see where the package is while it is running. For long running packages, it is a way to peek at the health of the process. For instance, if someone is doing bad things or the server is choking, you can tell by running that code and comparing results to past known good runs to ensure that you are not tracking behind in performance. Part 2 is about looking at completed runs and analyzing them. Just wanted to make that distinction.
The first thing you need to do is go back to the Logging config on your package and add the OnProgress event. This will allow us to look at more than just the DataFlow items' progress. I've crafted a pretty crazy query to produce some statistics that can be used in an Excel Pivot table/chart.
Update: The above query had a bug with some DataFlow transforms. It is fixed now, so please use the latest. The bug was how I was retrieving the RecordCount field. Some transforms would have "Input 1 : " and I was only looking for "Input : ". This caused an error in how the Right() was pulling the data. Thanks to Bill Fellows for pointing it out.
'Run ' + Cast(RunNumber As varchar) As RunName, S.Source,
DataFlowSource, RecordCount, RecordsPerSec,
Min(S.StartTime) StartTime, Max(S.EndTime) EndTime,
DateDiff(ms, Min(S.StartTime), Max(S.EndTime)) Duration
From sysdtslog90 S
Inner Join ( Select Distinct Source, ExecutionID,
Row_Number() Over (Order By StartTime) As RunNumber
Where event = 'PackageStart' ) P On S.ExecutionID = P.ExecutionID
Left Outer Join (
Select SourceID, ExecutionID, DataFlowSource,
(Cast(Sum(RecordCount) as real) / DateDiff(ms, Min(StartTime), Max(EndTime)))
* 1000 As RecordsPerSec
Select SourceID, ExecutionID, StartTime, EndTime,
CharIndex(':', message, 56)
+ 2, Len(message)),
CharIndex(':', message, 56)
+ 2, Len(message))
) - 2) As DataFlowSource,
Reverse(message)) - 2) As int) As RecordCount
Where event = 'OnPipelineRowsSent'
Group By SourceID, ExecutionID, DataFlowSource
) DF On S.SourceID = DF.SourceID And S.ExecutionID = DF.ExecutionID
Where message <> 'Validating' Group By
RunNumber, S.Source, DataFlowSource, RecordCount,
Case When S.Source = P.Source Then 1 Else 0 End
Case When S.Source = P.Source Then 1 Else 0 End Desc,
This data could also be used in other reporting things, but it was specifically designed for use in an Excel environment. The Pivot functions can be used to chart record counts, record speeds and Control Flow Task speeds. If you've never worked with Excel Pivot charts/tables, this is a great opportunity to start. If you have trouble getting it to work, leave me a comment and I'll help you out.
The following chart shows the execution time of different Control Flow Tasks:
As you can see, the Data Transfer for Demo task is MUCH slower in Run 3. This is because I tweaked to to be slower. Below is a detailed view of the RPM (Records Per Minute) of that DataFlow task. Note that I am using the same query results, I am just changing the Pivot options in my pivot table/chart.
We see here that the RPM is much slower. Interestingly enough, the setting that is making this so slow is A DEFAULT!!! The developer who made that a default needs a punch in the nose as most new SSIS developers won't understand this at first. This leaves a bad impression if they aren't thorough enough to look at the dropdown. Here is the setting that made this package come to a crawl:
You want the one that says "- fast load" because it does a REAL bulk load. The one selected does individual insert statements FOR EVERY ROW PASSED IN!! This is only useful in certain situations that I'll talk about at some point. Part 3 of this series will cover what to play around with to increase performance now that you have a way to look at the results and make some sense of what needs to be changed. Sorry this one is short, but I promise Part 3 will have some very good things!