Peter Larsson Blog

Patron Saint of Lost Yaks

Finding records in one table not present in another table

Difference between NOT IN, LEFT JOIN and NOT EXISTS.

The objective is to fetch all records in one table that are not present in another table. The most common code I’ve seen at client sites includes the use of NOT IN, because this keyword is included in most programming languages and programmers tend to use this technique when writing stored procedures in the database too. The code example I have found in most cases is this simple and understandable

SELECT     a.i
FROM       #a AS a
WHERE      a.i NOT IN (SELECT b.j FROM #b AS b)
           OR a.i IS NULL

The basic idea is to get all records from table #a where the value in column i is either NULL or not present in column j of table #b. What basically happens behind the scene is that the NOT IN part creates a list of values and stores them in a temporary table and then matches the values from column i in table #a against this temporary table. If there is not a match, or value from table #a is NULL, the column value is valid and returned to query.

There are at least three other ways to do this, which all are much more efficient!

SELECT     a.i
FROM       #a AS a
WHERE      a.i NOT IN (SELECT b.j FROM #b AS b)
UNION ALL
SELECT     a.i
FROM       #a AS a
WHERE      a.i IS NULL

SELECT     a.i
FROM       #a AS a
LEFT JOIN  #b AS b ON b.j = a.i
WHERE      b.j IS NULL

SELECT     a.i
FROM       #a AS a
WHERE      NOT EXISTS (SELECT * FROM #b AS b WHERE b.j = a.i)

Running these four queries simultaneously gives exactly the same result.

Original query takes 95% of the batch, second query (first alternative with UNION ALL) takes 2%, and the last two takes only 1% each!

This is the average profiler results for the four queries after 100 executions each 

 
CPU
Reads
Writes
Duration
Rows
Query 1
1 767
26 292
0
1 772
2 193
Query 2
6
57
0
6
2 193
Query 3
6
24
0
6
2 193
Query 4
3
24
0
3
2 193
 

 
 
 
 
   
As you can see, all three alternative approaches are superior to the NOT IN approach!
Next question is obviously “Which approach to use”?
  • Second query can be the best choice if there are few records because there are now 5 table scans compared to two table scans in the original query.
  • Third query is fast, but can be slow if table #b has many more records than table #a.
  • Fourth query will always perform fast because the EXISTS parts ends directly when first matching record is found. But if the second table #b is a mix of more JOINS this will not perform that fast.
It seems that LEFT JOIN and NOT EXISTS are both superior to NOT IN.
Which one of the two approaches you choose, must be decided carefully for every query needing this technique.
Also remember that if table #b were not a temporary table, SQL Server would have places a lot of locks on that table.
Continued here
http://weblogs.sqlteam.com/peterl/archive/2007/09/24/Finding-records-in-one-table-not-present-in-another-tableAgain.aspx

Legacy Comments


Michael Valentine Jones
2007-09-20
re: Finding records in one table not present in another table
You have to watch it if the columns you compare can have lots of duplicates. For example, it you have two tables which each have the same value duplicated 1 million times, you would have a cross product with 1 trillion rows if you use a left join.

The query below works around this problem.

select
a.MY_COLUMN
from
(
select MY_COLUMN, T1 = 1, T2 = 0 from TABLE_1 group by MY_COLUMN
union all
select MY_COLUMN, T1 = 0, T2 = 1 from TABLE_2 group by MY_COLUMN
) a
group by
a.MY_COLUMN
having
max(T1) = 1 and max(T2) = 0
order by
a.MY_COLUMN


Peso
2007-09-20
re: Finding records in one table not present in another table
Very interesting idea. Have to test it on monday. Thanks.

But..
1) Is two GROUP BY in the derived table faster than a single UNION?
2) HAVING should be MIN(T1) = 1 AND MAX(T2) = 0 instead?

Peso
2007-09-20
re: Finding records in one table not present in another table
I think only T1 is enough, right?

HAVING MIN(T1) = 1

Oh well. Monday it is for testing! I'll let you know of the results.

Michael Valentine Jones
2007-09-21
re: Finding records in one table not present in another table
I think the following works OK to find values that occur only in TABLE_1:
having max(T1) = 1 and max(T2) = 0

I think the following would do the same thing, since it would mean there were no rows from TABLE_2 because they all have values of 0. That would mean that column T2 is not needed.
having MIN(T1) = 1


Jeff
2007-09-21
re: Finding records in one table not present in another table
Michael -- isn't that over-engineering it quite a bit? All you should need is:

SELECT a.i
FROM #a AS a
LEFT JOIN (select disticint j from #b) AS b ON b.j = a.i
WHERE b.j IS NULL

If you need to avoid that potential issue.

Ken
2007-10-02
re: Finding records in one table not present in another table
After fixing the spelling of distinct, I'm with Jeff. Also the union selects will resolve to one row per id, not all ids in the first table.

Tim
2010-04-22
re: Finding records in one table not present in another table
Tried Jeff's, it works.

Senior Badmaash
2011-05-23
re: Finding records in one table not present in another table
Can someone help me identify what's missing from this query which is trying to find records in one table not present in another. Thanks
SELECT [qryNAMS-To-Go].[EC No], [qryNAMS-To-Go].[WO No], [qryNAMS-To-Go].[WO Disc], [qryNAMS-To-Go].[Task No], [qryNAMS-To-Go].[Task Disc], [qryNAMS-To-Go].[Task Planner], [qryNAMS-To-Go].[Equip No], [qryNAMS-To-Go].[Task Status], [qryNAMS-To-Go].[Task Status Desc], [qryNAMS-To-Go].[Task Desc], [qryNAMS-To-Go].[Task Outage Ind], [qryNAMS-To-Go].[Task S/D No], [qryNAMS-To-Go].[Sched Grp], [qryNAMS-To-Go].[Sched Notes], [qryNAMS-To-Go].[Task Early Strt Dt], [qryNAMS-To-Go].WOT
FROM [qryNAMS-To-Go],[qryWOTsTO-GO]
WHERE NOT EXISTS (SELECT * FROM [qryWOTsTO-GO] WHERE ([qryWOTsTO-GO].WO) = ([qryNAMS-To-Go].WOT))
ORDER BY [qryNAMS-To-Go].WOT;

Peso
2011-05-23
re: Finding records in one table not present in another table
SELECT [qryNAMS-To-Go].[EC No],
[qryNAMS-To-Go].[WO No],
[qryNAMS-To-Go].[WO Disc],
[qryNAMS-To-Go].[Task No],
[qryNAMS-To-Go].[Task Disc],
[qryNAMS-To-Go].[Task Planner],
[qryNAMS-To-Go].[Equip No],
[qryNAMS-To-Go].[Task Status],
[qryNAMS-To-Go].[Task Status Desc],
[qryNAMS-To-Go].[Task Desc],
[qryNAMS-To-Go].[Task Outage Ind],
[qryNAMS-To-Go].[Task S/D No],
[qryNAMS-To-Go].[Sched Grp],
[qryNAMS-To-Go].[Sched Notes],
[qryNAMS-To-Go].[Task Early Strt Dt],
[qryNAMS-To-Go].WOT
FROM [qryNAMS-To-Go]
,[qryWOTsTO-GO] -- Delete this line
WHERE NOT EXISTS (SELECT * FROM [qryWOTsTO-GO] WHERE [qryWOTsTO-GO].WO = [qryNAMS-To-Go].WOT)
ORDER BY [qryNAMS-To-Go].WOT

Danny Z
2011-06-09
re: Finding records in one table not present in another table
How does one go about locating records in one table not present in another table where multiple columns are used to denote a unique row?

For example:
tblA: PROC_CODE, MOD_1, MOD_2
tblB: PROC_CODE, MOD_1, MOD_2

The three columns in each table provide uniqueness to each row, but (here is the issue) the data in MOD_1 and MOD_2 may contain null values.

Example of data in tblA:

PROC_CODE MOD_1 MOD_2
S5000
S5000 AA
S5000 AA AB

Example of data in tblB:

PROC_CODE MOD_1 MOD_2
S5000
S5000 AA AB

If I want to find the records in tblA that do not exist in tblB, the results should be:

PROC_CODE MOD_1 MOD_2
S5000 AA

Thanks in advance!

Kiren paul
2011-09-23
re: Finding records in one table not present in another table
Hai it is working ... Thank you

ashish
2012-05-02
re: Finding records in one table not present in another table
SELECT a.i
FROM #a AS a
LEFT JOIN (select disticint j from #b) AS b ON b.j = a.i
WHERE b.j IS NULL


thank you Jeff for this query

steve
2012-07-22
re: Finding records in one table not present in another table
I have tried all 3 ways. None of them seem to work for me, what am I doing wrong? I am collecting entries, and want to seperate those that are not already a member to a list.

SELECT entries2.email
FROM #entries2 AS a
WHERE entries2.email NOT IN (SELECT members.member_email FROM #members AS b)


SELECT a.email
FROM #entries2 as a
WHERE NOT EXISTS (SELECT * FROM #members AS b WHERE b.member_email = a.email)


SELECT a.email
FROM #entries2 AS a
LEFT JOIN #members AS b ON b.member_email = a.email
WHERE a.email > ''