Finally, we have a Data Profiling Utility w/SQL Server !!!
With SQL Server Integration Services (SSIS) 2008 you will have access to the new Data Profiler task. The SSIS 2008 Data Profiler Task is an easy and effective method to profile your target source systems (tables & views) prior to building ETL solutions consuming their data. With the Data Profiler you can perform a variety of available 'profiles' against a SQL Server 2000 or later database using an ADO.Net connection. The output of the choosen profiles can be saved to an external XML file for viewing using a new standalone utility called the Data Profiler Viewer (C:\Program Files\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe) or to an SSIS variable. If you wish to manually review the profile results then you will want to save the output to an XML file using a file connection. If you intend to build programmatic control flow based upon the results of the profiles then you will need to save the results to a variable and then poll the contents of that variable.
Available Profiles & Profile Types
There are two types of profiles: single-column and multi-column|multi-table.
- Column Length Distribution
- Column Pattern
- Column NULL Ratio
- Column Statistics
- Column Value Distribution
- Candidate Key
- Functional Dependency
- Value Inclusion
Configuring the Data Profiler Task
To configure the Data Profiler is actually pretty simple. First, create a new ADO.Net connection manager to a SQL Server 2000+ instance. Next, if you wish to save your results to an XML file you will need to create a new File connection manager as well. Once you have your connection manager created go ahead and drag-and-drop an instance of the Data Profiler Task onto the control flow designer surface within a SSIS package using Business Intelligence Development Studio (BIDS). Right-click the task and select the Edit context menu option. On the General page you will designate the profile output and can optionally elect 'Quick Profiles' via clicking the button. Quick profiles allow you to select a variety of profiles using default options against the same table or view.
Next, in the Profile Request page you can configure individual profiles and their corresponding settings. These settings are fairly self explanatory, note that each profile has it's own unique settings which are displayed in the pane labeled Request Properties. Each row represents a new profile to be processed.
Finally, in the Expressions page you can optionally assign run-time formulas for any of the supported task properties.
Conclusion & Shortcomings
For a 'first stab' at a data profiling solution, the Data Profiler Task is pretty good. With the new task you can fairly quickly analyze various data quality and metadata information about your client's source systems. While I do praise the SSIS team (and Microsoft Research) for their efforts there are a few shortcomings you will quickly notice:
- Only supports ADO.Net & SQL Server 2000+ later
- No option to extend or create new profiles