Posts
49
Comments
50
Trackbacks
26
A better NTILE implementation

The implementation of the NTILE(n) windowing function is a little bit slow and requires a temp worktable that generates A LOT of I/O. Probably that because, as the BOL says, if you have a number of rows that is not divisible by your "n" bucket value, the function has to make the "Larger groups come before smaller groups in the order specified by the OVER clause".

If you're using NTILE for statistical purposes and so you don't care about having larger groups before smaller one, mostly because the difference among the groups population tipically will be of only one unit, you can implement NTILE with the following script:

SELECT 
   CustomerKey,
   CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM 
   
DimCustomer

where @n is a variable that contains your bucket value. For example, if you want a NTILE(5) your @n value will be 5.

You can make some test using the AdventureWorksDW database:

DECLARE @n INT;
SET @n = 5;

WITH CTE AS (
   SELECT 
      CustomerKey,
      NTILE(@n) OVER (ORDER BY YearlyIncome ASC) AS NTileStd
   FROM 
      DimCustomer
)
SELECT
   COUNT(*),
   NTileStd
FROM
   CTE
GROUP BY
   NTileStd
ORDER BY
   2;

WITH CTE AS (
   SELECT 
      CustomerKey,
      CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
   FROM 
   DimCustomer
)
SELECT
   COUNT(*),
   NTileFast
FROM
   CTE
GROUP BY
   NTileFast
ORDER BY
   2;

You'll notice that the first will make 38490 I/O (!!!) where the second one will only make 1036 I/O, which is 37 time LESS!!!!

I have discovered this behaviour with my collegue Marco Russo using a milions rows table and as you may image 37 times less I/O DOES the difference! :-)

posted on Friday, March 17, 2006 6:04 PM Print
News