Derek Comingore Blog

Derek Comingore's Microsoft Business Intelligence Blog

SQL Server 2008: The Data Collector

Based on CTP6 (Feb 08' build): Drop me a line and let me know your thoughts and experience with the new Data Collector subsystem!

Its nice to get a few minutes to post a blog, things have been busy (which is good) .  More to come on some other things in a bit...

I want to take a few minutes of your valuable time to give you a brief overview of the new SQL Server 2008 Data Collector (DC) subsystem. This post will be 1 of 2 on the topic of the Data Collecotr with the 2nd part serving as a more detailed tour of the subsystem. Like most new subsystems in SQL Server these days, the DC is composed of several 'lower' layers or platforms in the larger SQL Server product, namely SQL Server Agent and Integration Services (SSIS). So right of the bat, if you are not farily well educated in SQL Server Agent and SSIS you should brush up on those topics first before learning about DC.

Performance Data Warehouses (PDW) are not a new idea, however they are rarely implemented. Why? Because the amount of effort for creating a custom PDW (especially when its a 'pure' DBA doing it, no offense) far outweighs the return on investment or atleast that is the perception generally speaking. Microsoft has recognized this and added a new feature out-of-the-box to allow DBAs to create and populate their PDWs without the need to author complex SSIS/ETL packages or Reporting Services's (SSRS) reports. With just a few clicks in Management Studio (SSMS), one can quickly enable the Data Collector subsystem (disabled by default) and also start the default System Data Collection Sets. The DBA must also configure the Management Data Warehouse (MDW) which is essentially the DC's version of the PDW, that is it serves as the central relational repository for both system and custom Collection Sets from 1 or more target servers.

*DC is only supported on SQL Server 2008 based systems

The System Collection Sets leverage the all three supporte data sources. As the DC subsytem evolves, more data sources are expected to become available such as Windows Management Instrumentation (WMI) among others. When you create a Custom Collection Set you will have to specify which data sources or Collector Type to use. Currently, the DC supports three data sources:

  1. TSQL Queries
  2. SQL Trace
  3. PerfMon Objects/Counters

To create Custom Collection Sets you must leverage TSQL at the moment as SSMS does not support this. According to those at Microsoft, the DC should not (or is suppose to not) consume more than 5% total CPU time on 2008 systems and plan for around 250-350 MB per day of record activity. True activity measures are ofcourse central to each organization's usage but it does give you some ballpark figures to plan with and contemplate. The two key databases that drive the DC subsystem is the MSDB and the MDW. The MSDB contains all or most of the DC's metadata and configuration while the MDW contains (mostly) the actual data collected into 'snapshot' tables.

The DC is a great tool (and start) to enabling out-of-the-box performance data warehouses for SQL Server 2008 database systems. Within just a few minutes a user can configure an OLTP SQL Server 2008 system to collect and upload its collection set defined items to an designated MDW server. There were a few items mentioned in BOL that I could not locate and thus below are some of the items still needed (regardless of if they are in development or not):
1.       Dashboards (if not currently in the CTP6 build, cannot locate it in CTP6)
2.       Fact & Dimensions Tables/DMVs/Cubes (again, if not currently in the CTP6 build)
3.       Better integrated SSRS Reports in Management Studio including multi-server based ones
4.       Management Studio facilitating the creation of Custom Collection Sets (currently must use TSQL)
5.       Integration with the new policy framework of SQL Server 2008
6.       Performance Data Collection on the other SQL Server platforms: SSRS/SSIS/SSAS
7.       Continually consume less system resources per SQL Server build/product iteration


  4. SQL Server 2008 BOL



Legacy Comments

re: SQL Server 2008: The Data Collector
The information is slightly wrong because the number of data collectors are 4 but you have mentioned as only 3.

re: SQL Server 2008: The Data Collector
Do you have the 2nd part of this series? I'm not finding many articles on the net about people using this component successfully, even with it's pitfalls.

re: SQL Server 2008: The Data Collector
your not mentioning how to enable it "in just a few clicks"