James Nix Blog

SSIS Performance Framework Part 1

You can expect:

This is a medium intensity post (you should not get a migraine, but I make no promises)

I Expect (that you have the following experience):

  • The ability to POINT and CLICK on your OWN (I won't hold your hand like a little girl. Get over it.)
  • at lease SOME SSIS experience
  • SSIS Logging
  • Stored Procedure writing

 Performance Measuring in SSIS

SSIS Logging lets you see what the heck is going on during run-time of your packages. You can see where failures occur and that is what MOST people are anxious to see. I am not most people and I want to see how fast my packages are running. You can see total execution time from an SQL Agent job, but that isn't enough! I want to know how EVERYTHING is performing. Armed with this information I know exactly what to focus on if things are not what I expect them to be. This post will show you how to set up logging to a SQL server and then USE that logging info for the forces of good.

The first thing to do is turn on logging for the package (right click on the Control Flow surface - ok, that was a freebie):



Use the provider for SQL Server and configure your SQL Server connection manager. SSIS will automatically create a table called sysdtslog90 if it does not exist. Be sure to check both boxes as illustrated here:



Next is the Details tab. These settings are package level settings and will propagate to all objects in the package. So, when you see the OnError event checked, this will automatically be checked for any object you go to configure the logging for.



Ok, create a Dataflow task and pipe some data from one place to another. For purposes of this example, I am going to use a OleDB source, a Lookup, and a OleDB Destination. This next sentence is VERY IMPORTANT, so LISTEN UP. You need to name each of the Dataflow (and everything, really) steps something MEANINGFUL. "OleDB Source" is NOT MEANINGFUL. If you clods aren't already doing this in your packages, than you fail SSIS 101. Go to jail, go directly to jail, do NOT pass Go, do NOT collect $200.



Next, go back to the Control Flow surface and go back into logging. You will now see that the Package is not the only container. You will see that the DataFlow container is grayed out. This is because it is inheriting the settings from the parent. If you want (and we do) to configure this container ourselves, you have to check the box (click it twice - there I go again! slap my hand) and enable the SQL Server logger. Then go to the Details tab. All the setting from the patent have to be re-set because we choose to configure it ourselves. Check the same 2 we had last time and also OnPipelineRowSent. This event lets us know how many rows were sent to the particular component in the DataFlow (this is a DataFlow specific event). There are a lot of events you can look at and it's up to you to determine what you want. For this demo (because it is more about logging for Performance tuning) these are the only ones needed:



Now, pull up your Management Studio and run a Select on the table (Select * From sysdtslog90). You will see a goldmine of information. Times and row counts = much win. Play around with the data a little and them come back to this post. Go, do it. Go on!!
Ok, a couple of things to note. Executionid is an ID that covers the complete execution of the package. The message field contains the count of records sent to the component. To get the rowcount out of the field, use:
CAST(RIGHT([MESSAGE], LEN([MESSAGE]) - CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS INT)

This Stored Procedure offers a complete look at the progress of a package execution. It can be used to check performance over many executions to evaluate slow downs etc.
CREATE PROCEDURE [DBO].[CHECKPROGRESS]
(
@EXECUTIONID AS UNIQUEIDENTIFIER,
@MESSAGE AS VARCHAR(100),
@GOAL AS INT,
@PROCESS VARCHAR(50)
)
AS BEGIN
SELECT
PROCESS = @PROCESS, TIMEINSECS = DATEDIFF(SS, MIN(STARTTIME), MAX(ENDTIME)),
RECCOUNT = SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS INT)),
RECSPERSEC = SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
DATEDIFF(SS, MIN(STARTTIME), MAX(ENDTIME)),
RECSPERHOUR = (SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
DATEDIFF(SS, MIN(STARTTIME), MAX(ENDTIME))) * 60 * 60,
GOAL = @GOAL,
PERCENTCOMPLETE = (SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
@GOAL) * 100 ,
ESTIMATEDTOTALTIME = CAST(ROUND(DATEDIFF(SS, MIN(STARTTIME),
MAX(ENDTIME)) / (SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
@GOAL), 0) AS INT),
SERVERETA = DATEADD(SS, (CAST(ROUND(DATEDIFF(SS,
MIN(STARTTIME), MAX(ENDTIME)) / (SUM(CAST(RIGHT([MESSAGE],
LEN([MESSAGE]) - CHARINDEX('INPUT : ', [MESSAGE]) - 7)
AS NUMERIC)) / @GOAL), 0) AS INT) - DATEDIFF(SS,
MIN(STARTTIME), MAX(ENDTIME))), GETDATE()),
CURRENTSERVERTIME = GETDATE()

FROM
SYSDTSLOG90
WHERE EXECUTIONID = @EXECUTIONID
AND [EVENT] = 'ONPIPELINEROWSSENT' AND [MESSAGE] LIKE '%' + @MESSAGE + '%'
END

Here is a driver for that stored proc that matches my demo package.
DECLARE @MAX INT
DECLARE
@EXECID UNIQUEIDENTIFIER

SELECT
@MAX = MAX(ID)

FROM SYSDTSLOG90
WHERE STARTTIME BETWEEN CONVERT(VARCHAR, GETDATE(), 101)
AND
CONVERT(VARCHAR, DATEADD(DD, 1,GETDATE()), 101)


SELECT @EXECID = EXECUTIONID
FROM SYSDTSLOG90
WHERE ID = @MAX

DECLARE
@RESULTS TABLE ( PROCESS VARCHAR(50),
TIMEINSECS INT,
RECCOUNT INT,
RECSPERSEC REAL,
RECSPERHOUR REAL,
GOAL INT,
PERCENTCOMPLETE REAL,
ESTIMATEDTOTALTIME INT,
SERVERETA DATETIME,
CURRENTSERVERTIME DATETIME )

INSERT INTO
@RESULTS
EXEC CHECKPROGRESS @EXECID,
'LOOKUP THE DATA IN THE OTHER DEMO TABLE', 1, 'LOOKUP'

INSERT INTO
@RESULTS
EXEC CHECKPROGRESS @EXECID,
'RECORDS FOUND', 1, 'RECORDS FOUND'

INSERT INTO
@RESULTS
EXEC CHECKPROGRESS @EXECID,
'RECORDS NOT FOUND', 1, 'RECORDS NOT FOUND'

SELECT * FROM @RESULTS


If you run you package and then create the stored proc and then run the driver you should see some valuable results. I've used this exact stuff to measure performance day over day, month over month, etc. Keep in mind that there are other factors that can affect performance (such as some dimwit running a CURSOR on your server...), so sometimes you need to average runtimes. Armed with this data, you can march up to Excel and tell it, "Give me charts fool!". Charts make management giggly and thus you can defend your well written code when all of a sudden the Virtual your DB is running on decides to do something funny! (true story)

This post is the foundation for this performance series.  We will use the sysdtslog90 table extensively to determine actual performance and not just anecdotal performance.  The OnError stuff will be used in subsequent posts.  It is there for a reason if you were wondering.

-Madman out-

Legacy Comments


Mosher
2008-07-13
re: SSIS Performance Framework Part 1
Great job.

Damien Alvarado
2008-10-16
re: SSIS Performance Framework Part 1
Excellent post, I've also been reading about the SSIS API and when you inherit a mess (I'm a consultant) these tools can really point you in the right direction to get immediate impact.

We implemented a custom audit function but I think we could have used the built in logging component as you described.

I enjoy your writing style very much. It sounds like you wrote it while having a beer. This is a good thing.

James Nix
2008-10-20
re: SSIS Performance Framework Part 1
If you need assistance, feel free to contact me.

- Madman -