Posts
49
Comments
50
Trackbacks
26
March 2006 Blog Posts
Logging With SSIS: 3 things to know

I'm using SSIS in a big project and as any big project wants, I have to implement a quite extensive logging mechanism. "No problem" you may say. We'll, at least is what I said when I began to take care of logging. Unfortunately things are not *so* easy. To be realistic they're not easy at all.

Let me explain. I have a *lots* of SSIS Packages and many of them just encapsulate reusable logic; as such they are used as sub-packages in other packages. Things are quite complicated so I have a "Matrioska" situation where Packages call Packages that call Packages and so on.

Here comes the first problems:

1) Native logging doesn't seems to work for packages up from the third nesting level if you leave the default logging mode to "UseParentSettings". Be aware with "UseParentSettings" logging mode: you have to configure the package that uses this option the SAME AS if it has LoggingMode set to "True". So you have to configure a connection for logging and to specify and enable the log provider.

2) With native package logging you can't understand whether a task is contained in a bigger container or not. This is not a vital problem but it would have help to create more precise reports.

Now, let's also say that you have a package that must be executed against multiple databases. For example you have the database Customer1, Customer2 and so on, all with the same schema but with different data. You're surely going to create una SISS Package that has the ability to configure itself automatically so that it will run targeting different database basing on a very cool SSIS Configuration file. Since the package is REALLY big and create a LOT of logging you'll also think to create a different log database for any customer. So you'll have LOG_Customer1, LOG_Customer2 and so on. And here's the problem

3) If you want to log using a SQL Server Log Provider you CAN'T make the Sql Server connection dynamic (with the aid of an Expression) since logging begins BEFORE validation and BEFORE package configuration. This means that the logging with initally log on the database you have specified in the ConnectionString property of your Sql Server Connection, and only after package configuration has taken place will switch to log on your dynamicaly configured database.
The problem is that the database you configure in the ConnectionString statically may not exists, since you know that the value will be corrected at run-time. Unfurtunately this won't help if you need to run this package from another package since logging to a non-existent database will raise an error. The error is strange enough because won't make you inner package execution fail, but will only fail the outer package, leaving you with a "Executed successfully with error" situation.

To solve these problems i've started to create a DTExec on my own where I can control logging manually and from OUTSIDE the package.

As soon I have other information or solution i'll post them asap, in the meantime if someone has faced the same problems i'd like to hear his experience.

posted @ Monday, March 27, 2006 10:26 PM | Feedback (2)
A better NTILE implementation

The implementation of the NTILE(n) windowing function is a little bit slow and requires a temp worktable that generates A LOT of I/O. Probably that because, as the BOL says, if you have a number of rows that is not divisible by your "n" bucket value, the function has to make the "Larger groups come before smaller groups in the order specified by the OVER clause".

If you're using NTILE for statistical purposes and so you don't care about having larger groups before smaller one, mostly because the difference among the groups population tipically will be of only one unit, you can implement NTILE with the following script:

SELECT 
   CustomerKey,
   CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM 
   
DimCustomer

where @n is a variable that contains your bucket value. For example, if you want a NTILE(5) your @n value will be 5.

You can make some test using the AdventureWorksDW database:

DECLARE @n INT;
SET @n = 5;

WITH CTE AS (
   SELECT 
      CustomerKey,
      NTILE(@n) OVER (ORDER BY YearlyIncome ASC) AS NTileStd
   FROM 
      DimCustomer
)
SELECT
   COUNT(*),
   NTileStd
FROM
   CTE
GROUP BY
   NTileStd
ORDER BY
   2;

WITH CTE AS (
   SELECT 
      CustomerKey,
      CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
   FROM 
   DimCustomer
)
SELECT
   COUNT(*),
   NTileFast
FROM
   CTE
GROUP BY
   NTileFast
ORDER BY
   2;

You'll notice that the first will make 38490 I/O (!!!) where the second one will only make 1036 I/O, which is 37 time LESS!!!!

I have discovered this behaviour with my collegue Marco Russo using a milions rows table and as you may image 37 times less I/O DOES the difference! :-)

posted @ Friday, March 17, 2006 6:04 PM | Feedback (7)
News