Derek Comingore Blog

Derek Comingore's Microsoft Business Intelligence Blog

SQL Server 2008: SSIS Data Profiler Task

Finally, we have a Data Profiling Utility w/SQL Server !!!

With SQL Server Integration Services (SSIS) 2008 you will have access to the new Data Profiler task. The SSIS 2008 Data Profiler Task is an easy and effective method to profile your target source systems (tables & views) prior to building ETL solutions consuming their data. With the Data Profiler you can perform a variety of available 'profiles' against a SQL Server 2000 or later database using an ADO.Net connection. The output of the choosen profiles can be saved to an external XML file for viewing using a new standalone utility called the Data Profiler Viewer (C:\Program Files\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe) or to an SSIS variable. If you wish to manually review the profile results then you will want to save the output to an XML file using a file connection. If you intend to build programmatic control flow based upon the results of the profiles then you will need to save the results to a variable and then poll the contents of that variable.

Available Profiles & Profile Types

There are two types of profiles: single-column and multi-column|multi-table.

Single-Column Profiles:

  • Column Length Distribution
  • Column Pattern
  • Column NULL Ratio
  • Column Statistics
  • Column Value Distribution

Multi-Column|Table Profiles:

  • Candidate Key
  • Functional Dependency
  • Value Inclusion

Configuring the Data Profiler Task

To configure the Data Profiler is actually pretty simple. First, create a new ADO.Net connection manager to a SQL Server 2000+ instance. Next, if you wish to save your results to an XML file you will need to create a new File connection manager as well. Once you have your connection manager created go ahead and drag-and-drop an instance of the Data Profiler Task onto the control flow designer surface within a SSIS package using Business Intelligence Development Studio (BIDS). Right-click the task and select the Edit context menu option. On the General page you will designate the profile output and can optionally elect 'Quick Profiles' via clicking the button. Quick profiles allow you to select a variety of profiles using default options against the same table or view.Data Profiler Task General Page

Quick Profiles Page

Next, in the Profile Request page you can configure individual profiles and their corresponding settings. These settings are fairly self explanatory, note that each profile has it's own unique settings which are displayed in the pane labeled Request Properties. Each row represents a new profile to be processed.

Data Profiler Task Profile Requests Page

Finally, in the Expressions page you can optionally assign run-time formulas for any of the supported task properties.

Conclusion & Shortcomings

For a 'first stab' at a data profiling solution, the Data Profiler Task is pretty good. With the new task you can fairly quickly analyze various data quality and metadata information about your client's source systems. While I do praise the SSIS team (and Microsoft Research) for their efforts there are a few shortcomings you will quickly notice:

  1. Only supports ADO.Net & SQL Server 2000+ later
  2. No option to extend or create new profiles

Legacy Comments

Ira Warren Whiteside
re: SQL Server 2008: SSIS Data Profiler Task
I would like to share the new Melissa Data SSIS Total Data Quality Toolkit TDQ-IT, including SSIS Data Profiling 2005/2008.

Melissa Data SSIS Total Data Quality Toolkit TDQ-IT offers a wide range of data transformation and cleansing functionality including data profiling, parsing, cleansing, matching and monitoring functionality built right in to SSIS. And, TDQ-IT leverages SSIS to provide a flexible, effective solution for your organization’s data quality and master data management (MDM) initiatives. Request a free trial today.

SQL Newbie
re: SQL Server 2008: SSIS Data Profiler Task
<<Original Post>> Very helpful!