posts - 10, comments - 49, trackbacks - 0

Wednesday, February 13, 2008

Row Numbers in a DataFlow

You can expect:

This is a low intensity post (you should be able to do this in your sleep).

I Expect (that you have the following experience):

  • Basic SSIS usage
  • Basic VB.Net usage

 Row Numbers in a DataFlow.

I try to post often in the Forums here at SQLTeam.com for a few reasons.  First, I get to help people in an interactive way which is very rewarding to me.  Second, I get to learn from others who post answers.  I appreciate the forums very much and the few tricks I have learned so far.  In this thread, a question about how to get row numbers for an incoming text file was posed.  The simple answer is, use a script transform.  The next question, "How?".

Script Transforms can be a very useful tool if used properly.  You must be careful not to throw a lot of code in there because you can slow your package down considerably.  For instance, DO NOT EVER call a web service from one (If you do, I won't help you anymore!  Well, maybe a little bit)!  If you have to do something like that, roll your own component and cache calls or multithread them (this WILL be a future post by the way).

For this example, we want to take a flat file, add a row number and attempt to save it to a database.  If this fails, we want to export the rows to a flat file and include the row number from the source file.  Here is my design:


For sample data, I exported my sysdtslog90 table.  I created the file source Connection Manager and my Flat File Source.  I then added the Script Transform.  I added an output column with a data type of "four byte signed int" shown here:


My input file does not have column headers, so I can start with 1.  If I had column headers, I would start with 2.  If I skipped more rows I would just adjust my starting number to where the DATA STARTS.  Then I edited my Script and entered the following:
Public Class ScriptMain : Inherits UserComponent
Private _RowNumber As Integer = 1

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.RowNumber = _RowNumber
_RowNumber += 1
End Sub

End Class
Of note here.  You could also check a column to see if there is a change and restart numbering if you wanted to do something fancy.  You could reproduce the functionality of Row_Number() Over (Partition By myColumn) using this method.

I then added my OLE DB Destination and added the Connection Manager for it and created the destination table and wired up the columns.  I made sure I used Fast Load into the table.  Because I want the error output, I set the "Rows per batch" and the "Maximum insert commit size" to the same thing.  For this example I set them to 5 since my sample data is only 1260 records.  When doing this in the future, DO NOT BE A KNUCKLEHEAD AND USE 5!!  This is a Bulk Insert, so a good size is 10k or so.  Of note, however, is that if you expect a LARGE number of failed records, I would consider checking the data first and doing something else with it.


Next I add another OLE DB Destination and wire it to the Error Output from the previous.  It will prompt me what to do on failure and I tell it to "Redirect Row".  When there is a failure, the WHOLE BATCH FAILS.  In this example, when there is a failure with the first Destination, 5 rows will be sent to the error output.  The second Destination inserts 1 record at a time, so when the offending record is reached again, it fails that record alone.  Next I add my Flat File Destination and create the Connection Manager for it.  I wire it up the same way with the error output.  I could if I wanted to try to capture the error message, but I'll leave that up to another post because it is not very friendly.  I map my fields and then I am done.

In my example, my destination table has enough room for each field being pulled in from the file.  I want it to fail, so I added a constraint on my table that I know will fail some records.  The result looks like this:


As you can see, only 24 records were bad, but caused 120 records to have to be inserted 1 at a time.  The results in my output file are:
5,OnProgress,MLS
10,OnProgress,ML
12,OnProgress,ML
17,OnProgress,ML
22,OnProgress,ML
27,OnProgress,ML
414,OnProgress,M
419,OnProgress,M
425,OnProgress,M
430,OnProgress,M
432,OnProgress,M
437,OnProgress,M
442,OnProgress,M
447,OnProgress,M
834,OnProgress,M
839,OnProgress,M
845,OnProgress,M
850,OnProgress,M
852,OnProgress,M
857,OnProgress,M
862,OnProgress,M
867,OnProgress,M
1254,OnProgress,
1259,OnProgress,

As you can see, the row numbers are now present in the error output and can be used to report back to the producer of the file.  This method DOES NOT allow you report on files that break the file format.  This method will not output a record has an extra or missing column delimiter for example.  For those types of things, you have to do some preprocessing step.

-Madman out-

posted @ Wednesday, February 13, 2008 10:31 AM | Feedback (11) | Filed Under [ SSIS Madness ]

Powered by:
Powered By Subtext Powered By ASP.NET