Mladen Prajdić Blog

Blog about stuff and things and stuff. Mostly about SQL server and .Net

SQL Server: JOIN vs IN vs EXISTS - the logical difference

 

There is a common misconception that IN behaves equaliy to EXISTS or JOIN in terms of returned results.

This is simply not true. To see why not, let's review what each statement does.

 

IN:

Returns true if a specified value matches any value in a subquery or a list.

Exists:

Returns true if a subquery contains any rows.

Join:

Joins 2 resultsets on the joining column.

 

If not read carefully it looks pretty same so far, doesn't it.

The difference comes when you take into account the "dreaded" THREE-VALUED LOGIC.

Let's review that also:

-- this is true
SELECT 1 WHERE 1 = 1

– this is false SELECT 1 WHERE 1 = 0

– this is unknown - it is usually expected to be false, but that only shows – misunderstanding of nulls. It's not false it's only treated as false in the filter SELECT 1 WHERE 1 = NULL

– this is also unknown - but logicaly it would seem it would be true – but unknown compared to unknown equals uknown and it is treated as false in the filter SELECT 1 WHERE NULL = NULL

 

The where will return a row only if the condition evaluates to true which UNKNOWN isn't.

 

So let's demonstrate this with some code. I've commented it heavily so it can be self explaining.

I've also shown a parser bug in the IN Query. Well... it's a bug if you ask me. It shouldn't behave like that.

 

------------------------------------------------------------------
-- Prepare tables and data
------------------------------------------------------------------
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO

INSERT INTO t1 SELECT 1, 'title 1', 5 UNION ALL SELECT 2, 'title 2', 5 UNION ALL SELECT 3, 'title 3', 5 UNION ALL SELECT 4, 'title 4', 5 UNION ALL SELECT null, 'title 5', 5 UNION ALL SELECT null, 'title 6', 5

INSERT INTO t2 SELECT 1, 1, 'data 1' UNION ALL SELECT 2, 1, 'data 2' UNION ALL SELECT 3, 2, 'data 3' UNION ALL SELECT 4, 3, 'data 4' UNION ALL SELECT 5, 3, 'data 5' UNION ALL SELECT 6, 3, 'data 6' UNION ALL SELECT 7, 4, 'data 7' UNION ALL SELECT 8, null, 'data 8' UNION ALL SELECT 9, 6, 'data 9' UNION ALL SELECT 10, 6, 'data 10' UNION ALL SELECT 11, 8, 'data 11'

—————————————————————- – we want to get all data in t1 that has a child row in t2 —————————————————————-

– join gives us more rows than we need, because it joins to every child row SELECT t1.* FROM t1 JOIN t2 ON t1.id = t2.t1Id – distinct would solve that but it's not pretty nor efficient SELECT DISTINCT t1.* FROM t1 JOIN t2 ON t1.id = t2.t1Id

– now this is a weird part where someIntCol is a column in t1 – but the parser doesn't seem to mind that SELECT t1.* FROM t1 WHERE t1.id IN (SELECT someIntCol FROM t2)

– here in and exists both get correct results SELECT t1.* FROM t1 WHERE t1.id IN (SELECT t1id FROM t2)

SELECT t1.* FROM t1 WHERE exists (SELECT * FROM t2 WHERE t1.id = t2.t1id)

—————————————————————- – we want to get all data in t1 that doesn't have a child row in t2 —————————————————————-

– join gives us the correct result SELECT t1.* FROM t1 LEFT JOIN t2 ON t1.id = t2.t1Id WHERE t2.id IS NULL

– IN doesn't get correct results. – That's because of how IN treats NULLs and the Three-valued logic – NULL is treated as an unknown, so if there's a null in the t2.t1id – NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE. – when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set. SELECT t1.* FROM t1 WHERE t1.id NOT IN (SELECT t1id FROM t2)

– NOT EXISTS gets correct results SELECT t1.* FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id) GO

DROP TABLE t2 DROP TABLE t1

 

We can see that it's best to use EXISTS because it always behaves as the user would think it does.

Hope this demonstartes the logical difference between the 3 close yet so far apart functionalities of SQL Server

 

kick it on DotNetKicks.com

Legacy Comments


mcgurk
2007-05-18
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Very nice article. A few of these have come across dotnetkicks in the past month or so (including one from me where I was ranting about the Unknown result from a programmer's standpoint). This is the nicest one so far with the best example.

Mladen
2007-05-18
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
thanx. glad you like it :)

Denis the SQL Menace
2007-05-18
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Now for fun run this

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT someIntCol FROM t2)

notice that the someIntCol column does NOT exists in the t2 table


you can read more about that behaviour here: http://sqlservercode.blogspot.com/2007/04/you-should-never-use-in-in-sql-to-join.html

Mladen
2007-05-18
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
nice Denis. Thanx for sharing.

Jeff
2007-05-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Denis -- that's a good point and a thing to keep in mind when writing any correlated sub-query, with or without using IN(). That's why in general I prefer derived tables -- they tend to be neater and I like the way that each derived table is a complete, self-contained SELECT, and you never need to worry about that issue.

Andrew Dixon
2007-05-21
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Excellent article - well explained

Adriaan
2007-07-23
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Denis's example is what you get when you do not properly reference columns.

The unknown column in the subquery does exist in (one of) the table(s) in the main query, so SQL Server can resolve the name, and no error is raised.

Note that the error would be raised if you added a reference the table before the column in the subquery (like you should):

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t2.someIntCol FROM t2)

neelam
2007-09-03
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
nice topic.....
it helped me......
thanks

Pawan
2007-09-06
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
nice articles

i was confused and take anyone from these three commands

thank u very much

regards
pawan
Pune, India

Jan Tenenberg
2007-09-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I have the following query...

SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON
WHERE EXISTS (
SELECT PERSON.PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M'
)
GROUP BY PERSON.PERSON_URN

As you can see, I am selecting from the same table inside the subquery as I am in the overall query. This returns everybody in the table, whereas the subquery by itself returns about 50%. I do not understand why this is and would appreciate your help with this very much.

Please help!!!

Thanks Jan

Mladen
2007-09-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
so why don't you just do:
SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M'
GROUP BY PERSON.PERSON_URN

Jan Tenenberg
2007-09-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I thought you might say that...He He. I have developed an application that generates SQL on the fly and subqueries form part of this across many tables. The issue only really comes into it when the subquery table is the same as the outside table. Do you know what the problem is with my original statement and how I can get around it with maintaining the subquery?

Many Thanks,

Jan

Mladen
2007-09-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON P1
WHERE EXISTS (
SELECT PERSON.PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M' and PERSON.PERSON_URN = P1.PERSON_URN
)
GROUP BY PERSON.PERSON_URN

Jan Tenenberg
2007-09-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Brilliant! Thank You Very Much!

Mark
2007-12-13
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Thanks! I was banging my head trying to figure out why my NOT IN statement didnt work.

I still dont understand why the statement works if it is just IN though.

Gerard
2007-12-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Hi! I was thinking about this ...

Does a "SELECT A.Afield, B.AnotherField FROM TableA A JOIN TableB B on 1=1" can hurt a DB?
Performance and all!?
Is it better to do 2 selcets? anyway if one is null I can't proceed...

(Instead of doing 2 select in a row... one should be better no?)

TY :)

Munish
2008-01-04
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I have tab1 as:

Col1 Col2 Col3
1 2 Null

Then the Master as:

Col Name

1 One
2 Two
3 Three


I want the output as

Col1 Col2 Col3
One Two Null

Any help? I have been struggling since yesterday.


Ashish
2008-01-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Can I use two column in IN...

example:

SELECT t1.*
FROM t1
WHERE t1.id, t1.Title NOT IN (SELECT t1id, Somedata FROM t2)

Mladen
2008-01-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
no. that's why you have exists.

Mladen
2008-01-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Munish:
look for pivot or Cross tab on sqlteam

Channappa Bangaloer
2008-03-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Thanks for your in-depth analysis.
I was specifically looking at understanding the difference between EXISTS and IN keywords w.r.t SQL Server.
This article helped me a lot.

retheredge
2008-05-16
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Helpful to me and good article but...
It seems moot and contrived because would one ever allow NULLs in primary key or foreign key columns?
I can't imagine it.

Josh
2008-06-24
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
retheredge, you missed the point, the reason pk is null is the result of an outer join

Dimitry
2008-06-26
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Great article
I was going completly loco, trying to understand why my "not in" does not work ...

merlinyoda
2008-07-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Josh, you're trying to compare apples and oranges....

retheredge is saying that this particular example is contrived as you aren't likely going to have a NULL value on any key value like an ID . The overall point about how NULL evaluates as neither "true" or "false" when compared to another value is, on the other hand, a valid one. However, rarely is it the case that tables are joined or otherwise linked using columns that could actually contain NULL values. It's *much* more common "join" on generic, non-NULL integer key (primary or foreign).

Let's take a subset of the data set given in the example:
TABLE t1
(id INT, title VARCHAR(20), someIntCol INT)
1, 'title 1', 5
2, 'title 2', 5
3, 'title 3', 5

TABLE t2
(id INT, t1Id INT, someData VARCHAR(20))
1, 1, 'data 1'
2, 1, 'data 2'
4, 3, 'data 4'
5, 3, 'data 5'
6, 3, 'data 6'


Given *this* data set the following queries give the same results:
SELECT t1.*
FROM t1
LEFT JOIN t2 ON t1.id = t2.t1Id
WHERE t2.id IS NULL

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t1id FROM t2)

SELECT t1.*
FROM t1
WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)

All 3 will return:
2, 'title 2', 5


To sum up, in a majority of data sets in the wild, all there queries will operate the same. Although using (INNER/LEFT/RIGHT/OUTER) JOIN is usually better with respect to shortest processing times.

merlinyoda
2008-07-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Josh, also the "primary key" in question (t1.id) is *not* NULL because of an OUTER JOIN (the examples used LEFT OUTER JOIN a.k.a. LEFT JOIN, which is not the same as OUTER JOIN a.k.a. FULL OUTER JOIN). Some of the values of t1 are NULL because they are *set* as NULL in t1. Though, t1.id isn't much of a "primary key" anyway as there are 2 separate records in the example data set that have NULL values. t2 is a little more "acceptable" as far as data constraints t2.t1id would be a column acting like a foreign key and t2.id a primary key for t2.

If t2.* were added to the outer query select, then you will get NULL values for t2 columns when there aren't any "matching" records as specified by the ON clause (i.e. t1.id=t2.t1id evaluates to "true"). Additionally in the other queries, we aren't comparing t2 columns with the subquery results, we're comparing t1 columns to the subquery (which happens to be querying t2).

Boumxyz
2008-08-26
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Hello,

I would like to add some precision as some person would believe that IN is not a good operator.

NULLs : NULL is not comparable to anything. Also, using null in any operation will result in a null value. EX :

SELECT 'TEST' + NULL will give NULL as result. The only way to test if a value is null is to use the NULL operator.


IN:

Returns true if a specified value matches any value in a subquery or a list. This is true. I must add though that IN acts as a INNER JOIN would for matching purpose. It will keep only Lines where both operands matches.

Exists:

Returns true if a subquery contains any rows. Yes this is VERY different from IN. In check if matching occurs, Exists check only if SUBQUERY returns a row.

Join:


There are 2 kinds of joins

INNER ONES :

Inner returns only matching lines.

SELECT * from table1 join table2 on table1.id = table2.id will act like SELECT * from table1, table2 where table1.id = table2.id


OUTER ONES (LEFT, RIGHT, FULL) :

Those joins are different

LEFT will keep all the records from the left table (select * from table1 left join table2 on table1.id = table2.id in this query table1) and ADD NULLS value to the columns of table2 so that each line of table1 has at least 1 matching lines.

it also translates into this

select * from t1,t2 where t1.id = t2.t1id
UNION ALL -- ALL with prevent union from doing a distinct
select *, NULL, NULL, NULL from t1 where t1.id is null or t1.id not in (select t2.t1id from t2 where t2.t1id is not null)

And you can guess it right is the other way around. It will keep all the lines of the RIGHT table on the join statement.
(select * from table1 left join table2 on table1.id = table2.id in this query table2),

FULL is the combination of both join. LEFT AND RIGHT meaning no lines are lost and are completed with matching nulls from the other table if no match is possible

it translates into this.

select * from t1,t2 where t1.id = t2.t1id
UNION ALL
select *, NULL, NULL, NULL from t1 where t1.id is null or t1.id not in (select t2.t1id from t2 where t2.t1id is not null)
UNION ALL
select null,null,null, * from t2 where t2.t1id is null or t2.t1id not in (select t1.id from t1 where t1.id is not null)


marcelo silva
2008-12-04
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
tkx!!!

Eltayeb
2008-12-31
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Thank you all for your comments!

mxj
2009-01-02
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Great article, Mladen and thanks for sharing!

aravind
2009-01-15
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
good an excellant one

gokul
2009-01-20
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
VERY GOOD

Kube
2009-01-28
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Hehe, a very good article. Always useful, even if its almost two years old.

Now that I know the difference, I wonder which one is the fastest one.

If I understand the EXIST behavior, the sub-query processing will stop at the first "existing" that matches, making it the fastest one to use.

Am I right ?

Kube

Mladen
2009-01-28
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
yes you are. i alwsay prefer exists.
however the best bet of course is to look at the execution plan.

JOIN
2009-02-27
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
:)))))))))))

wael
2009-03-23
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
JOIN IS BETTER

Ken
2009-04-14
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I'm hoping that someone can help me. I've been banging my head against a wall for a while now... Here's what I'm trying to do:

I have a table (LAWSON_APVENMAST1) that contains vendor information. One of the fields (LAWSON_APVENMAST1.TAX_ID) has duplicate records (meaning that there are different vendors with the same Tax ID).

I have the following query that will give a list of the TAX_IDs that exist more than once in the table, but I cannot figure out how to return the Vendor Number (LAWSON_APVENMAST1.VENDOR) and Vendor Name (LAWSON_APVENMAST1.VENDOR_VNAME).

SELECT LAWSON_APVENMAST1.TAX_ID, Count(LAWSON_APVENMAST1.TAX_ID) AS CountOfTAX_ID
FROM LAWSON_APVENMAST1
GROUP BY LAWSON_APVENMAST1.TAX_ID
HAVING (((LAWSON_APVENMAST1.TAX_ID)<>" ") AND ((Count(LAWSON_APVENMAST1.TAX_ID))>1));

Any help is greatly appreciated!
Thanks,
Ken

nami
2009-04-15
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Ken -

SELECT T.TAX_ID, T.VENDOR, T.VENDOR_VNAME
FROM LAWSON_APVENMAST1 T
WHERE EXISTS (
select B.TAX_ID, count(B.TAX_ID) as CountofTAX_ID
from LAWSON_APVENMAST1 B
where T.TAX_ID = B.TAX_ID
group by B.TAX_ID
having (((B.TAX_ID)<>'') and ((count(B.TAX_ID))>1))
)

Steve
2009-07-21
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Awesome, this really helped me!!!

Terry
2009-10-07
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
This one always gets me when i do a not in and wonder why it returns no results, thanks for the tip

Serder
2009-10-08
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
thx, nice article, you made me pass an SQL exam

Thais
2009-10-16
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
The only reason why "not exists ( select * ...)" works is the use of "*" in the select, as not all columns are null.
Try changing the "*" to t2.t1id; or adding "UNION ( select NULL,NULL,NULL)" to the end of the inner query.
Either one returns an empty set for the same reason that IN does: one cannot safely compare NULL.

OBender
2010-03-25
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Great help!

Len
2010-04-08
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Cheers, very useful....

¡
2010-07-29
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
'

escalera
2010-08-02
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Thank for this article. It's very informative.

newButSolidMind
2010-09-28
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
Very useful.

newButSolidMind
2010-09-28
re: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I have 2 questions related to exists

1. How does the exists work? Let's say I have ID(numeric), NAme(char) & salary(numeric) columns in customer and CustID(numeric) in T2. How does the sql engine know which columns from customer should be used to match with the resulted result from T2 if exists is used as follows
select customer.* from customer where exists (select 1 from T2 where T2.CustID=customer.ID)

2. Can we use exists and UNION together?
I have table customer as mentioned above. I have table T1 with column CustID(numeric) which contains IDs from customer < 10. There is another table T2 with column CID(numeric) which contains values > 10 AND < 15.
I want to get all the records from customer whose IDs present either of the tables T1 or T2.

To find union, I can use the following query.
SELECT t1.CustID FROM T1 t1
UNION
SELECT t2.CID FROM T2 t2

I thought the below query will return me all the records for the customers from the above result but it gave me all the records from the customer table. :(

select t3.* from customer t3 where exists (SELECT t1.CustID FROM T1 t1
UNION
SELECT t2.CID FROM T2 t2)

Please help me find the correct solution for this scenario.

Thanks.