This is a medium intensity post (you should not get a migraine, but I make no promises)
Use the provider for SQL Server and configure your SQL Server connection manager. SSIS will automatically create a table called sysdtslog90 if it does not exist. Be sure to check both boxes as illustrated here:
Next is the Details tab. These settings are package level settings and will propagate to all objects in the package. So, when you see the OnError event checked, this will automatically be checked for any object you go to configure the logging for.
Ok, create a Dataflow task and pipe some data from one place to another. For purposes of this example, I am going to use a OleDB source, a Lookup, and a OleDB Destination. This next sentence is VERY IMPORTANT, so LISTEN UP. You need to name each of the Dataflow (and everything, really) steps something MEANINGFUL. "OleDB Source" is NOT MEANINGFUL. If you clods aren't already doing this in your packages, than you fail SSIS 101. Go to jail, go directly to jail, do NOT pass Go, do NOT collect $200.
Next, go back to the Control Flow surface and go back into logging. You will now see that the Package is not the only container. You will see that the DataFlow container is grayed out. This is because it is inheriting the settings from the parent. If you want (and we do) to configure this container ourselves, you have to check the box (click it twice - there I go again! slap my hand) and enable the SQL Server logger. Then go to the Details tab. All the setting from the patent have to be re-set because we choose to configure it ourselves. Check the same 2 we had last time and also OnPipelineRowSent. This event lets us know how many rows were sent to the particular component in the DataFlow (this is a DataFlow specific event). There are a lot of events you can look at and it's up to you to determine what you want. For this demo (because it is more about logging for Performance tuning) these are the only ones needed:
Now, pull up your Management Studio and run a Select on the table (Select * From sysdtslog90). You will see a goldmine of information. Times and row counts = much win. Play around with the data a little and them come back to this post. Go, do it. Go on!!
Ok, a couple of things to note. Executionid is an ID that covers the complete execution of the package. The message field contains the count of records sent to the component. To get the rowcount out of the field, use:
CAST(RIGHT([MESSAGE], LEN([MESSAGE]) - CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS INT)
This Stored Procedure offers a complete look at the progress of a package execution. It can be used to check performance over many executions to evaluate slow downs etc.
CREATE PROCEDURE [DBO].[CHECKPROGRESS]
(
@EXECUTIONID AS UNIQUEIDENTIFIER,
@MESSAGE AS VARCHAR(100),
@GOAL AS INT,
@PROCESS VARCHAR(50)
)
AS BEGIN
SELECT PROCESS = @PROCESS, TIMEINSECS = DATEDIFF(SS, MIN(STARTTIME), MAX(ENDTIME)),
RECCOUNT = SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS INT)),
RECSPERSEC = SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
DATEDIFF(SS, MIN(STARTTIME), MAX(ENDTIME)),
RECSPERHOUR = (SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
DATEDIFF(SS, MIN(STARTTIME), MAX(ENDTIME))) * 60 * 60,
GOAL = @GOAL,
PERCENTCOMPLETE = (SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
@GOAL) * 100 ,
ESTIMATEDTOTALTIME = CAST(ROUND(DATEDIFF(SS, MIN(STARTTIME),
MAX(ENDTIME)) / (SUM(CAST(RIGHT([MESSAGE], LEN([MESSAGE]) -
CHARINDEX('INPUT : ', [MESSAGE]) - 7) AS NUMERIC)) /
@GOAL), 0) AS INT),
SERVERETA = DATEADD(SS, (CAST(ROUND(DATEDIFF(SS,
MIN(STARTTIME), MAX(ENDTIME)) / (SUM(CAST(RIGHT([MESSAGE],
LEN([MESSAGE]) - CHARINDEX('INPUT : ', [MESSAGE]) - 7)
AS NUMERIC)) / @GOAL), 0) AS INT) - DATEDIFF(SS,
MIN(STARTTIME), MAX(ENDTIME))), GETDATE()),
CURRENTSERVERTIME = GETDATE()
FROM SYSDTSLOG90
WHERE EXECUTIONID = @EXECUTIONID
AND [EVENT] = 'ONPIPELINEROWSSENT' AND [MESSAGE] LIKE '%' + @MESSAGE + '%'
END
Here is a driver for that stored proc that matches my demo package.
DECLARE @MAX INT
DECLARE @EXECID UNIQUEIDENTIFIER
SELECT @MAX = MAX(ID)
FROM SYSDTSLOG90
WHERE STARTTIME BETWEEN CONVERT(VARCHAR, GETDATE(), 101)
AND CONVERT(VARCHAR, DATEADD(DD, 1,GETDATE()), 101)
SELECT @EXECID = EXECUTIONID
FROM SYSDTSLOG90
WHERE ID = @MAX
DECLARE @RESULTS TABLE ( PROCESS VARCHAR(50),
TIMEINSECS INT,
RECCOUNT INT,
RECSPERSEC REAL,
RECSPERHOUR REAL,
GOAL INT,
PERCENTCOMPLETE REAL,
ESTIMATEDTOTALTIME INT,
SERVERETA DATETIME,
CURRENTSERVERTIME DATETIME )
INSERT INTO @RESULTS
EXEC CHECKPROGRESS @EXECID,
'LOOKUP THE DATA IN THE OTHER DEMO TABLE', 1, 'LOOKUP'
INSERT INTO @RESULTS
EXEC CHECKPROGRESS @EXECID,
'RECORDS FOUND', 1, 'RECORDS FOUND'
INSERT INTO @RESULTS
EXEC CHECKPROGRESS @EXECID,
'RECORDS NOT FOUND', 1, 'RECORDS NOT FOUND'
SELECT * FROM @RESULTS
If you run you package and then create the stored proc and then run the driver you should see some valuable results. I've used this exact stuff to measure performance day over day, month over month, etc. Keep in mind that there are other factors that can affect performance (such as some dimwit running a CURSOR on your server...), so sometimes you need to average runtimes. Armed with this data, you can march up to Excel and tell it, "Give me charts fool!". Charts make management giggly and thus you can defend your well written code when all of a sudden the Virtual your DB is running on decides to do something funny! (true story)
This post is the foundation for this performance series. We will use the sysdtslog90 table extensively to determine actual performance and not just anecdotal performance. The OnError stuff will be used in subsequent posts. It is there for a reason if you were wondering.
-Madman out-