Collapse date range, with safety date range
This example is based on AdventureWorks database. More details about the task is found here http://sqlblog.com/blogs/adam_machanic/archive/2009/12/08/t-sql-tuesday-001-exploring-fuzzy-interval-islands-using-sqlclr.aspx.
The task is to produce a set of dateranges where a product has been sold, with a "latency" of 7 days. I wanted do display that there are other way to write a query to make it more efficient. Adam's SQLCLR version runs in 0.5 seconds.
DECLARE @Interval INT = 7
;WITH cteSingle(ProductID, TransactionDate, recID)
AS (
SELECT ProductID,
TransactionDate,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
FROM Production.TransactionHistory
), cteLower(ProductID, StartDate, recID)
AS (
SELECT s.ProductID,
s.TransactionDate AS EndDate,
ROW_NUMBER() OVER (PARTITION BY s.ProductID ORDER BY s.TransactionDate) AS recID
FROM cteSingle AS s
LEFT JOIN cteSingle AS t ON t.ProductID = s.ProductID
AND t.recID = s.recID - 1
WHERE DATEADD(DAY, @Interval, t.TransactionDate) < s.TransactionDate
OR t.ProductID IS NULL
), cteUpper(ProductID, EndDate, recID)
AS (
SELECT s.ProductID,
DATEADD(DAY, @Interval, s.TransactionDate) AS EndDate,
ROW_NUMBER() OVER (PARTITION BY s.ProductID ORDER BY s.TransactionDate) AS recID
FROM cteSingle AS s
LEFT JOIN cteSingle AS t ON t.ProductID = s.ProductID
AND t.recID = s.recID + 1
WHERE DATEADD(DAY, @Interval, s.TransactionDate) < t.TransactionDate
OR t.ProductID IS NULL
)
SELECT l.ProductID,
l.StartDate,
u.EndDate
FROM cteLower AS l
INNER JOIN cteUpper AS u ON u.ProductID = l.ProductID
WHERE l.recID = u.recID
ORDER BY l.ProductID,
l.recID
Here are the measurements from SQL Profiler.
Peso
RowCount - 4020
CPU - 1560
Duration - 1856
Reads - 3367
Original
RowCount - 4020
CPU - 35303
Duration - 24216
Reads - 1976780
Legacy Comments
Ryan Randall
2010-01-08 |
re: Collapse date range, with safety date range Hi Peso Is there any improvement for you if you first take distinct ProductID and TransactionDate? i.e. change "Production.TransactionHistory" to "(select distinct ProductID, TransactionDate from Production.TransactionHistory) a" It seems quicker on my machine, but we've had differences recently. |
Peso
2010-01-08 |
re: Collapse date range, with safety date range Yes, there is some improvement. I am currently having a conversion with MVP Adam Machanic and I am trying to understand how SQLCLR's read tables. His SQLCLR solution runs in 6 seconds (on a much larger table), and the fastest SET-BASED solution found yet runs in 28-36 seconds. |
Peso
2010-01-08 |
re: Collapse date range, with safety date range What we have concluded this far, is that the set-based solutions have to scan the table index several times (4-6), and the SQLCLR just parsed the table once, without needing the tempdb or anything. |
Aaron West
2010-01-10 |
re: Collapse date range, with safety date range It has occurred to me at times that CTEs can be slower than a temp-table based approach, "nice" though they are (since temp tables seem "messy"). Here's a comparison on my 2.67ghz quad core Q9400: CPU time | approach 812ms | CTE (above) 640ms | temp table 530ms | temp table w/clustered index 594ms | temp table w/clustered index and DISTINCT My SQL starts with: SET STATISTICS TIME ON DECLARE @Interval INT SET @Interval = 7 ;SELECT DISTINCT ProductID, TransactionDate, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID INTO cteSingle --(ProductID, TransactionDate, recID) FROM Production.TransactionHistory CREATE CLUSTERED INDEX xth on cteSingle(ProductID, TransactionDate) ;WITH cteLower(ProductID, StartDate, recID) AS ( .... and the rest is the same as above. ... DROP TABLE cteSingle |