Peter Larsson Blog

Patron Saint of Lost Yaks

Collapse date range, with safety date range

This example is based on AdventureWorks database. More details about the task is found here http://sqlblog.com/blogs/adam_machanic/archive/2009/12/08/t-sql-tuesday-001-exploring-fuzzy-interval-islands-using-sqlclr.aspx.
The task is to produce a set of dateranges where a product has been sold, with a "latency" of 7 days. I wanted do display that there are other way to write a query to make it more efficient. Adam's SQLCLR version runs in 0.5 seconds.

DECLARE @Interval INT = 7

;WITH cteSingle(ProductID, TransactionDate, recID)
AS (
    SELECT  ProductID,
            TransactionDate,
            ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
    FROM    Production.TransactionHistory
), cteLower(ProductID, StartDate, recID)
AS (
    SELECT      s.ProductID,
                s.TransactionDate AS EndDate,
                ROW_NUMBER() OVER (PARTITION BY s.ProductID ORDER BY s.TransactionDate) AS recID
    FROM        cteSingle AS s
    LEFT JOIN   cteSingle AS t ON t.ProductID = s.ProductID
                AND t.recID = s.recID - 1
    WHERE       DATEADD(DAY, @Interval, t.TransactionDate) < s.TransactionDate
                OR t.ProductID IS NULL
), cteUpper(ProductID, EndDate, recID)
AS (
    SELECT      s.ProductID,
                DATEADD(DAY, @Interval, s.TransactionDate) AS EndDate,
                ROW_NUMBER() OVER (PARTITION BY s.ProductID ORDER BY s.TransactionDate) AS recID
    FROM        cteSingle AS s
    LEFT JOIN   cteSingle AS t ON t.ProductID = s.ProductID
                AND t.recID = s.recID + 1
    WHERE       DATEADD(DAY, @Interval, s.TransactionDate) < t.TransactionDate
                OR t.ProductID IS NULL
)
SELECT      l.ProductID,
            l.StartDate,
            u.EndDate
FROM        cteLower AS l
INNER JOIN  cteUpper AS u ON u.ProductID = l.ProductID
WHERE       l.recID = u.recID
ORDER BY    l.ProductID,
            l.recID


Here are the measurements from SQL Profiler.

Peso
RowCount - 4020
CPU - 1560
Duration - 1856
Reads - 3367

Original
RowCount - 4020
CPU - 35303
Duration - 24216
Reads - 1976780

Legacy Comments


Ryan Randall
2010-01-08
re: Collapse date range, with safety date range
Hi Peso

Is there any improvement for you if you first take distinct ProductID and TransactionDate?

i.e. change "Production.TransactionHistory" to "(select distinct ProductID, TransactionDate from Production.TransactionHistory) a"

It seems quicker on my machine, but we've had differences recently.

Peso
2010-01-08
re: Collapse date range, with safety date range
Yes, there is some improvement.
I am currently having a conversion with MVP Adam Machanic and I am trying to understand how SQLCLR's read tables. His SQLCLR solution runs in 6 seconds (on a much larger table), and the fastest SET-BASED solution found yet runs in 28-36 seconds.

Peso
2010-01-08
re: Collapse date range, with safety date range
What we have concluded this far, is that the set-based solutions have to scan the table index several times (4-6), and the SQLCLR just parsed the table once, without needing the tempdb or anything.

Aaron West
2010-01-10
re: Collapse date range, with safety date range
It has occurred to me at times that CTEs can be slower than a temp-table based approach, "nice" though they are (since temp tables seem "messy"). Here's a comparison on my 2.67ghz quad core Q9400:

CPU time | approach
812ms | CTE (above)
640ms | temp table
530ms | temp table w/clustered index
594ms | temp table w/clustered index and DISTINCT

My SQL starts with:

SET STATISTICS TIME ON
DECLARE @Interval INT
SET @Interval = 7
;SELECT DISTINCT
ProductID,
TransactionDate,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
INTO cteSingle --(ProductID, TransactionDate, recID)
FROM Production.TransactionHistory
CREATE CLUSTERED INDEX xth on cteSingle(ProductID, TransactionDate)
;WITH cteLower(ProductID, StartDate, recID)
AS (
.... and the rest is the same as above.
...
DROP TABLE cteSingle