Based on CTP6 (Feb 08' build): Drop me a line and let me know your thoughts and experience with the new Data Collector subsystem!
Its nice to get a few minutes to post a blog, things have been busy (which is good)
. More to come on some other things in a bit...
I want to take a few minutes of your valuable time to give you a brief overview of the new SQL Server 2008 Data Collector (DC) subsystem. This post will be 1 of 2 on the topic of the Data Collecotr with the 2nd part serving as a more detailed tour of the subsystem. Like most new subsystems in SQL Server these days, the DC is composed of several 'lower' layers or platforms in the larger SQL Server product, namely SQL Server Agent and Integration Services (SSIS). So right of the bat, if you are not farily well educated in SQL Server Agent and SSIS you should brush up on those topics first before learning about DC.
Performance Data Warehouses (PDW) are not a new idea, however they are rarely implemented. Why? Because the amount of effort for creating a custom PDW (especially when its a 'pure' DBA doing it, no offense) far outweighs the return on investment or atleast that is the perception generally speaking. Microsoft has recognized this and added a new feature out-of-the-box to allow DBAs to create and populate their PDWs without the need to author complex SSIS/ETL packages or Reporting Services's (SSRS) reports. With just a few clicks in Management Studio (SSMS), one can quickly enable the Data Collector subsystem (disabled by default) and also start the default System Data Collection Sets. The DBA must also configure the Management Data Warehouse (MDW) which is essentially the DC's version of the PDW, that is it serves as the central relational repository for both system and custom Collection Sets from 1 or more target servers.
*DC is only supported on SQL Server 2008 based systems
The System Collection Sets leverage the all three supporte data sources. As the DC subsytem evolves, more data sources are expected to become available such as Windows Management Instrumentation (WMI) among others. When you create a Custom Collection Set you will have to specify which data sources or Collector Type to use. Currently, the DC supports three data sources:
- TSQL Queries
- SQL Trace
- PerfMon Objects/Counters
To create Custom Collection Sets you must leverage TSQL at the moment as SSMS does not support this. According to those at Microsoft, the DC should not (or is suppose to not) consume more than 5% total CPU time on 2008 systems and plan for around 250-350 MB per day of record activity. True activity measures are ofcourse central to each organization's usage but it does give you some ballpark figures to plan with and contemplate. The two key databases that drive the DC subsystem is the MSDB and the MDW. The MSDB contains all or most of the DC's metadata and configuration while the MDW contains (mostly) the actual data collected into 'snapshot' tables.
The DC is a great tool (and start) to enabling out-of-the-box performance data warehouses for SQL Server 2008 database systems. Within just a few minutes a user can configure an OLTP SQL Server 2008 system to collect and upload its collection set defined items to an designated MDW server. There were a few items mentioned in BOL that I could not locate and thus below are some of the items still needed (regardless of if they are in development or not):
1. Dashboards (if not currently in the CTP6 build, cannot locate it in CTP6)
2. Fact & Dimensions Tables/DMVs/Cubes (again, if not currently in the CTP6 build)
3. Better integrated SSRS Reports in Management Studio including multi-server based ones
4. Management Studio facilitating the creation of Custom Collection Sets (currently must use TSQL)
5. Integration with the new policy framework of SQL Server 2008
6. Performance Data Collection on the other SQL Server platforms: SSRS/SSIS/SSAS
7. Continually consume less system resources per SQL Server build/product iteration
Resources
- http://blogs.msdn.com/sqlrem/archive/2008/01/17/performance-studio-for-sql-server-2008-revealed.aspx
- http://sqlblogcasts.com/blogs/jasonmassie/archive/2008/02/15/sql-server-2008-performance-data-collection-in-5-minutes.aspx
- http://sqlblogcasts.com/blogs/jasonmassie/archive/2008/02/15/creating-a-custom-data-collection-in-sql-server-2008.aspx
- SQL Server 2008 BOL
Cheers
Derek
Microsoft
SQL Server 2008 introduces several enhancements & changes to how we group and aggregate relational result sets in TSQL. What follows is a simplistic listing and overview of the grouping/aggregate changes we are seeing in Microsoft SQL Server 2008 CTP5:
1. New GROUPING SETS Operator
2. ROLLUP & CUBE Operator Syntax Changes
3. New GROUPING_ID() Function
GROUP BY Operator Addition: GROUPING SETS
The first and most noted change in how we can perform relational grouping is the GROUPING SETS operator addition to the GROUP BY clause. This new operator allows you to perform several grouping set in one query. The older equivalent of this functionality is to perform several GROUP BY queries and then perform a UNION ALL between them. The GROUPING SETS operator supports concatenating column groupings and an optional grand total row. GROUPING SETS can be used in conjunction with the ROLLUP & CUBE operators (which we will use the new syntax described below for ROLLUP|CUBE). Below is some sample TSQL code to demonstrate this functionality:
*All sample code requires the AdventureWorks & AdventureWorks DW sample databases.
USE AdventureWorksDW;
GO
--------OLD METHOD, UNION ALL
SELECT NULL AS [ProductKey], OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY OrderDateKey, DueDateKey, ShipDateKey
UNION ALL
SELECT ProductKey, NULL AS [OrderDateKey], DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, DueDateKey, ShipDateKey
UNION ALL
SELECT ProductKey, OrderDateKey, NULL AS [DueDateKey], ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, ShipDateKey
UNION ALL
SELECT ProductKey, OrderDateKey, DueDateKey, NULL AS [ShipDateKey]
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, DueDateKey
GO
--------NEW METHOD, GROUPING SETS
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(ProductKey, OrderDateKey, DueDateKey, ShipDateKey);
GO
--------NEW METHOD, GROUPING SETS w/Grand Total Row (record #1125 is the grand total row)
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(ProductKey, OrderDateKey, DueDateKey, ShipDateKey, ());
GO
--------NEW METHOD, GROUPING SETS w/Composite Columns
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS((ProductKey, OrderDateKey), DueDateKey, ShipDateKey);
GO
--------NEW METHOD, GROUPING SETS w/ROLLUP Composite Columns
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(ROLLUP(ProductKey, OrderDateKey), DueDateKey, ShipDateKey);
GO
--------NEW METHOD, GROUPING SETS w/CUBE Composite Columns
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(CUBE(ProductKey, OrderDateKey), DueDateKey, ShipDateKey);
GO
GROUP BY Operator Syntax Changes: ROLLUP & CUBE
Next up are the changes in the ROLLUP|CUBE operators' syntax. These operators produce aggregate groupings and are appended to the GROUP BY clause. Prior to SQL Server 2008 you would specify WITH ROLLUP|WITH CUBE. In SQL Server 2008, you will first designate ROLLUP|CUBE and then pass the grouped columns into these operators. Below is some sample TSQL code to demonstrate this functionality:
USE AdventureWorksDW;
GO
--------OLD METHOD, WITH ROLLUP
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, DueDateKey, ShipDateKey WITH ROLLUP;
GO
--------NEW METHOD, ROLLUP()
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ROLLUP(ProductKey, OrderDateKey, DueDateKey, ShipDateKey);
GO
--------OLD METHOD, WITH CUBE
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, DueDateKey, ShipDateKey WITH CUBE;
GO
--------NEW METHOD, CUBE()
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY CUBE(ProductKey, OrderDateKey, DueDateKey, ShipDateKey);
GO
New Function: GROUPING_ID()
Last but not least, there is a new function, GROUPING_ID which returns the level of grouping for a particular column. This function is an enhancement from the older GROUPING() function. GROUPING() returns a 0 or a 1 to indicate if the column is aggregated or not. Below is some sample TSQL code to demonstrate this functionality:
USE AdventureWorks;
GO
SELECT
D.Name
,GROUPING_ID(D.Name, E.Title) AS [Name_Title_GroupID]
,GROUPING(D.Name) AS [Name_GroupAggregated?]
,GROUPING(E.Title) AS [Title_GroupAggregated?]
,COUNT(E.EmployeeID) AS N'Employee Count'
FROM HumanResources.Employee E
INNER JOIN HumanResources.EmployeeDepartmentHistory DH
ON E.EmployeeID = DH.EmployeeID
INNER JOIN HumanResources.Department D
ON D.DepartmentID = DH.DepartmentID
WHERE DH.EndDate IS NULL
AND D.DepartmentID IN (12,14)
GROUP BY ROLLUP(D.Name, E.Title);