The implementation of the NTILE(n) windowing function is a little bit slow and requires a temp worktable that generates A LOT of I/O. Probably that because, as the BOL says, if you have a number of rows that is not divisible by your "n" bucket value, the function has to make the "Larger groups come before smaller groups in the order specified by the OVER clause".
If you're using NTILE for statistical purposes and so you don't care about having larger groups before smaller one, mostly because the difference among the groups population tipically will be of only one unit, you can implement NTILE with the following script:
SELECT
CustomerKey,
CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM
DimCustomer
where @n is a variable that contains your bucket value. For example, if you want a NTILE(5) your @n value will be 5.
You can make some test using the AdventureWorksDW database:
DECLARE
@n INT;
SET @n = 5;
WITH
CTE AS (
SELECT
CustomerKey,
NTILE(@n) OVER (ORDER BY YearlyIncome ASC) AS NTileStd
FROM
DimCustomer
)
SELECT
COUNT(*),
NTileStd
FROM
CTE
GROUP BY
NTileStd
ORDER BY
2;
WITH
CTE AS (
SELECT
CustomerKey,
CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM
DimCustomer
)
SELECT
COUNT(*),
NTileFast
FROM
CTE
GROUP BY
NTileFast
ORDER BY
2;
You'll notice that the first will make 38490 I/O (!!!) where the second one will only make 1036 I/O, which is 37 time LESS!!!!
I have discovered this behaviour with my collegue Marco Russo using a milions rows table and as you may image 37 times less I/O DOES the difference! :-)