I did some tests today to measure the different approaches for finding records present in one table but not in another. The results of CPU and Duration are presented below with some help from SQL Profiler.
Number of reads are equal between methods but different depending how many record in #TableB.
If there are other methods I haven't included, please let me know.
Method TableA TableB CPU Duration
---------- ------- ------- --- --------
GROUP BY 1000000 1000000 748 754
LEFT JOIN 1000000 1000000 328 321
NOT EXISTS 1000000 1000000 265 288
NOT IN 1000000 1000000 296 293
EXCEPT 1000000 1000000 312 288
GROUP BY 1000000 500000 577 2984
LEFT JOIN 1000000 500000 328 2930
NOT EXISTS 1000000 500000 187 2991
NOT IN 1000000 500000 312 2861
EXCEPT 1000000 500000 234 2882
GROUP BY 1000000 0 453 5866
LEFT JOIN 1000000 0 280 5791
NOT EXISTS 1000000 0 125 5855
NOT IN 1000000 0 250 5825
EXCEPT 1000000 0 234 5798
Here is the code for testing
CREATE TABLE #TableA
(
i INT PRIMARY KEY CLUSTERED
)
INSERT #TableA
SELECT Number
FROM dbo.F_TABLE_NUMBER_RANGE(0, 999999)
ORDER BY Number
CREATE TABLE #TableB
(
i INT PRIMARY KEY CLUSTERED
)
INSERT #TableB
SELECT Number
FROM dbo.F_TABLE_NUMBER_RANGE(0, 999999)
ORDER BY Number
DELETE f
FROM (
SELECT TOP 500000
i
FROM #TableB
ORDER BY NEWID()
) AS f
-- GROUP BY
SELECT i
FROM (
SELECT 0 AS s, i FROM #TableA UNION ALL
SELECT 1, i FROM #TableB
) AS d
GROUP BY i
HAVING MAX(s) = 0
-- LEFT JOIN
SELECT a.i
FROM #TableA AS a
LEFT JOIN #TableB AS b ON b.i = a.i
WHERE b.i IS NULL
-- NOT EXISTS
SELECT a.i
FROM #TableA AS a
WHERE NOT EXISTS (SELECT b.i FROM #TableB AS b WHERE b.i = a.i)
-- NOT IN
SELECT a.i
FROM #TableA AS a
WHERE a.i NOT IN (SELECT b.i FROM #TableB AS b)
-- EXCEPT
SELECT i FROM #TableA
EXCEPT
SELECT i FROM #TableB
DROP TABLE #TableA,
#TableB