posts - 10, comments - 49, trackbacks - 0

Row Numbers in a DataFlow

You can expect:

This is a low intensity post (you should be able to do this in your sleep).

I Expect (that you have the following experience):

  • Basic SSIS usage
  • Basic VB.Net usage

 Row Numbers in a DataFlow.

I try to post often in the Forums here at SQLTeam.com for a few reasons.  First, I get to help people in an interactive way which is very rewarding to me.  Second, I get to learn from others who post answers.  I appreciate the forums very much and the few tricks I have learned so far.  In this thread, a question about how to get row numbers for an incoming text file was posed.  The simple answer is, use a script transform.  The next question, "How?".

Script Transforms can be a very useful tool if used properly.  You must be careful not to throw a lot of code in there because you can slow your package down considerably.  For instance, DO NOT EVER call a web service from one (If you do, I won't help you anymore!  Well, maybe a little bit)!  If you have to do something like that, roll your own component and cache calls or multithread them (this WILL be a future post by the way).

For this example, we want to take a flat file, add a row number and attempt to save it to a database.  If this fails, we want to export the rows to a flat file and include the row number from the source file.  Here is my design:


For sample data, I exported my sysdtslog90 table.  I created the file source Connection Manager and my Flat File Source.  I then added the Script Transform.  I added an output column with a data type of "four byte signed int" shown here:


My input file does not have column headers, so I can start with 1.  If I had column headers, I would start with 2.  If I skipped more rows I would just adjust my starting number to where the DATA STARTS.  Then I edited my Script and entered the following:
Public Class ScriptMain : Inherits UserComponent
Private _RowNumber As Integer = 1

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.RowNumber = _RowNumber
_RowNumber += 1
End Sub

End Class
Of note here.  You could also check a column to see if there is a change and restart numbering if you wanted to do something fancy.  You could reproduce the functionality of Row_Number() Over (Partition By myColumn) using this method.

I then added my OLE DB Destination and added the Connection Manager for it and created the destination table and wired up the columns.  I made sure I used Fast Load into the table.  Because I want the error output, I set the "Rows per batch" and the "Maximum insert commit size" to the same thing.  For this example I set them to 5 since my sample data is only 1260 records.  When doing this in the future, DO NOT BE A KNUCKLEHEAD AND USE 5!!  This is a Bulk Insert, so a good size is 10k or so.  Of note, however, is that if you expect a LARGE number of failed records, I would consider checking the data first and doing something else with it.


Next I add another OLE DB Destination and wire it to the Error Output from the previous.  It will prompt me what to do on failure and I tell it to "Redirect Row".  When there is a failure, the WHOLE BATCH FAILS.  In this example, when there is a failure with the first Destination, 5 rows will be sent to the error output.  The second Destination inserts 1 record at a time, so when the offending record is reached again, it fails that record alone.  Next I add my Flat File Destination and create the Connection Manager for it.  I wire it up the same way with the error output.  I could if I wanted to try to capture the error message, but I'll leave that up to another post because it is not very friendly.  I map my fields and then I am done.

In my example, my destination table has enough room for each field being pulled in from the file.  I want it to fail, so I added a constraint on my table that I know will fail some records.  The result looks like this:


As you can see, only 24 records were bad, but caused 120 records to have to be inserted 1 at a time.  The results in my output file are:
5,OnProgress,MLS
10,OnProgress,ML
12,OnProgress,ML
17,OnProgress,ML
22,OnProgress,ML
27,OnProgress,ML
414,OnProgress,M
419,OnProgress,M
425,OnProgress,M
430,OnProgress,M
432,OnProgress,M
437,OnProgress,M
442,OnProgress,M
447,OnProgress,M
834,OnProgress,M
839,OnProgress,M
845,OnProgress,M
850,OnProgress,M
852,OnProgress,M
857,OnProgress,M
862,OnProgress,M
867,OnProgress,M
1254,OnProgress,
1259,OnProgress,

As you can see, the row numbers are now present in the error output and can be used to report back to the producer of the file.  This method DOES NOT allow you report on files that break the file format.  This method will not output a record has an extra or missing column delimiter for example.  For those types of things, you have to do some preprocessing step.

-Madman out-

Print | posted on Wednesday, February 13, 2008 10:31 AM | Filed Under [ SSIS Madness ]

Feedback

Gravatar

# re: Row Numbers in a DataFlow

I tried this method..but looks like Im doing some mistake.I have records coming in from flat file and upon failure or Error.These records would be loaded to a ERROR table.This table has the following columns.
[Flat File Source Error Output Column] [text] NULL,
[Flat File Source_ErrorCode] [int] NULL,
[Flat File Source_ErrorColumn] [int] NULL,
[ErrorDescription] [text] NULL,
[Table Name] [varchar](50) NULL,
[InputDate] [datetime] NULL,
[RowNumber] [int] NULL
I added 2 Output Columns to my Script Component, ErrorDescription & RowNumber.
Here is the script

Public Class ScriptMain
Inherits UserComponent
Private _RowNumber As Integer = 1
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.ErrorDescription = ComponentMetaData.GetErrorDescription(Row.FlatFileSourceErrorCode)
Row.RowNumber = _RowNumber
_RowNumber += 1
End Sub
End Class

Output on Row Number is as 1,2,3,4 and so on.It does not return row number in respect to the invalid row.Do advise .....
2/13/2008 2:27 PM | LOOKUP_BI
Gravatar

# re: Row Numbers in a DataFlow

Because you are using the Error output, you are splitting the incoming rows into two buffers. You cannot therefore get the real row number using my method. I'll research to see if there is another way, but I don't think there is a good one.
2/13/2008 3:17 PM | Madman
Gravatar

# re: Row Numbers in a DataFlow

Hello guys,
i'm a beginer to with SSIS and i have encounted a few problems regarding this topic : handeling ErrorCode and ErrorColumn of a redirected rows of a failing component (Derived, Lookup, OLDB Destination, etc). I succed to obtain for the rows that goes into error the description , but can you help me please , by explaing to me how to get the component name of the component who generate the error and if it's possible the column name how goes in error ? (i know that for the rows who fails on Derived Column there is inposible to get the column name who generate description but for the others i don't know ) And rewinding to the name of component who generate the error i give you an example of how my SSIS is designed : i import datas from an oracle DB and inserting them into a SQL DB but i have also a Derived column component, a lookup and an insert component and i want to find up at Union All witch error came from what component. Please if you have some ideeas help me, Thank you
3/10/2008 4:10 AM | Romeo
Gravatar

# Where are the new entries?

So what happened to this blog? Where are all the new articles and blog entries? I subscribed to this feed and I'm ready for something to read.
4/14/2008 1:38 PM | updates?
Gravatar

# re: Row Numbers in a DataFlow

Hello guys,
i'm a beginer to with SSIS and i have encounted a few problems regarding this topic : handeling ErrorCode and ErrorColumn of a redirected rows of a failing component (Derived, Lookup, OLDB Destination, etc). I succed to obtain for the rows that goes into error the description , but can you help me please , by explaing to me how to get the component name of the component who generate the error and if it's possible the column name how goes in error ? (i know that for the rows who fails on Derived Column there is inposible to get the column name who generate description but for the others i don't know ) And rewinding to the name of component who generate the error i give you an example of how my SSIS is designed : i import datas from an oracle DB and inserting them into a SQL DB but i have also a Derived column component, a lookup and an insert component and i want to find up at Union All witch error came from what component. Please if you have some ideeas help me, Thank you
9/18/2008 1:42 AM | whybi
Gravatar

# re: Row Numbers in a DataFlow

How to reproduce the functionality of Row_Number() Over (Partition By myColumn) in SSIS?
6/3/2009 2:31 PM | ssisnewbie
Gravatar

# re: Row Numbers in a DataFlow

I am tgrying to dunp the data to a flat file but it is not working.

Eve thogh I have 2 colums and my mapping looks OK, I opnly get data to the rows column.

Can anyone help?
5/11/2010 1:57 PM | Andreas
Gravatar

# re: Row Numbers in a DataFlow

no idea
7/20/2010 9:11 AM | farooq
Gravatar

# re: Row Numbers in a DataFlow

Hi All,

This is very useful to me...B'cos I am not good at VB Script...and it is very easy to understand

Thanks & Regards
Chetan
7/23/2010 10:37 PM | Chetan
Gravatar

# re: Row Numbers in a DataFlow

Thanks man
12/14/2011 10:31 AM | unknown
Gravatar

# re: Row Numbers in a DataFlow

Nice article. rest of the users, "ENGLISH PLEASE"...
7/17/2012 5:54 PM | techgeek
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET