I want some Moore

Blog about stuff and things and stuff. Mostly about SQL server and .Net
posts - 218, comments - 2281, trackbacks - 33

My Links

Advertisement

News

Hi! My name is 
Mladen Prajdić  I'm from Slovenia and I'm currently working as a .Net (C#) and SQL Server developer.

I also speak at local user group meetings and conferences like SQLBits and NT Conference
Welcome to my blog.
SQL Server MVP

My Books

SQL Server MVP Deep Dives 2
The Red Gate Guide to SQL Server Team based Development Free e-book

My Blog Feed via Email
Follow MladenPrajdic on Twitter


Users Online: who's online

Article Categories

Archives

Post Categories

Cool software

Other Blogs

Other stuff

SQL stuff

SQL Server: JOIN vs IN vs EXISTS - the logical difference

 

There is a common misconception that IN behaves equaliy to EXISTS or JOIN in terms of returned results.

This is simply not true. To see why not, let's review what each statement does.

 

IN:

Returns true if a specified value matches any value in a subquery or a list.

Exists:

Returns true if a subquery contains any rows.

Join:

Joins 2 resultsets on the joining column.

 

If not read carefully it looks pretty same so far, doesn't it.

The difference comes when you take into account the "dreaded" THREE-VALUED LOGIC.

Let's review that also:

-- this is true
SELECT 1 WHERE 1 = 1

-- this is false
SELECT 1 WHERE 1 = 0

-- this is unknown - it is usually expected to be false, but that only shows 
-- misunderstanding of nulls. It's not false it's only treated as false in the filter
SELECT 1 WHERE 1 = NULL

-- this is also unknown - but logicaly it would seem it would be true
-- but unknown compared to unknown equals uknown and it is treated as false in the filter
SELECT 1 WHERE NULL = NULL

 

The where will return a row only if the condition evaluates to true which UNKNOWN isn't.

 

So let's demonstrate this with some code. I've commented it heavily so it can be self explaining.

I've also shown a parser bug in the IN Query. Well... it's a bug if you ask me. It shouldn't behave like that.

 

------------------------------------------------------------------
-- Prepare tables and data
------------------------------------------------------------------
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO

INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5

INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'

------------------------------------------------------------------
-- we want to get all data in t1 that has a child row in t2
------------------------------------------------------------------

-- join gives us more rows than we need, because it joins to every child row
SELECT    t1.* 
FROM    t1 
        JOIN t2 ON t1.id = t2.t1Id
-- distinct would solve that but it's not pretty nor efficient
SELECT    DISTINCT t1.* 
FROM    t1 
        JOIN t2 ON t1.id = t2.t1Id

-- now this is a weird part where someIntCol is a column in t1 
-- but the parser doesn't seem to mind that
SELECT    t1.* 
FROM    t1 
WHERE    t1.id IN (SELECT someIntCol FROM t2)

-- here in and exists both get correct results
SELECT    t1.* 
FROM    t1 
WHERE    t1.id IN (SELECT t1id FROM t2)

SELECT    t1.* 
FROM    t1 
WHERE    exists (SELECT * FROM t2 WHERE t1.id = t2.t1id)

------------------------------------------------------------------
-- we want to get all data in t1 that doesn't have a child row in t2
------------------------------------------------------------------

-- join gives us the correct result
SELECT    t1.* 
FROM    t1 
        LEFT JOIN t2 ON t1.id = t2.t1Id
WHERE    t2.id IS NULL

-- IN doesn't get correct results.
-- That's because of how IN treats NULLs and the Three-valued logic
-- NULL is treated as an unknown, so if there's a null in the t2.t1id 
-- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE.
-- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set. 
SELECT    t1.* 
FROM    t1 
WHERE    t1.id NOT IN (SELECT t1id FROM t2)

-- NOT EXISTS gets correct results
SELECT    t1.* 
FROM    t1 
WHERE    NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
GO

DROP TABLE t2
DROP TABLE t1

 

We can see that it's best to use EXISTS because it always behaves as the user would think it does.

Hope this demonstartes the logical difference between the 3 close yet so far apart functionalities of SQL Server

 

kick it on DotNetKicks.com

Print | posted on Friday, May 18, 2007 6:52 PM | Filed Under [ SQL Server ]

Feedback

Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Very nice article. A few of these have come across dotnetkicks in the past month or so (including one from me where I was ranting about the Unknown result from a programmer's standpoint). This is the nicest one so far with the best example.
5/18/2007 7:40 PM | mcgurk
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

thanx. glad you like it :)
5/18/2007 7:42 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Now for fun run this

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT someIntCol FROM t2)

notice that the someIntCol column does NOT exists in the t2 table


you can read more about that behaviour here: http://sqlservercode.blogspot.com/2007/04/you-should-never-use-in-in-sql-to-join.html
5/18/2007 8:08 PM | Denis the SQL Menace
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

nice Denis. Thanx for sharing.
5/18/2007 8:09 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Denis -- that's a good point and a thing to keep in mind when writing any correlated sub-query, with or without using IN(). That's why in general I prefer derived tables -- they tend to be neater and I like the way that each derived table is a complete, self-contained SELECT, and you never need to worry about that issue.
5/20/2007 5:58 PM | Jeff
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Excellent article - well explained
5/21/2007 5:44 AM | Andrew Dixon
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Denis's example is what you get when you do not properly reference columns.

The unknown column in the subquery does exist in (one of) the table(s) in the main query, so SQL Server can resolve the name, and no error is raised.

Note that the error would be raised if you added a reference the table before the column in the subquery (like you should):

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t2.someIntCol FROM t2)
7/23/2007 12:47 PM | Adriaan
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

nice topic.....
it helped me......
thanks
9/3/2007 7:34 AM | neelam
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

nice articles

i was confused and take anyone from these three commands

thank u very much

regards
pawan
Pune, India
9/6/2007 5:02 PM | Pawan
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I have the following query...

SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON
WHERE EXISTS (
SELECT PERSON.PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M'
)
GROUP BY PERSON.PERSON_URN

As you can see, I am selecting from the same table inside the subquery as I am in the overall query. This returns everybody in the table, whereas the subquery by itself returns about 50%. I do not understand why this is and would appreciate your help with this very much.

Please help!!!

Thanks Jan
9/20/2007 1:30 PM | Jan Tenenberg
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

so why don't you just do:
SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M'
GROUP BY PERSON.PERSON_URN
9/20/2007 1:50 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I thought you might say that...He He. I have developed an application that generates SQL on the fly and subqueries form part of this across many tables. The issue only really comes into it when the subquery table is the same as the outside table. Do you know what the problem is with my original statement and how I can get around it with maintaining the subquery?

Many Thanks,

Jan
9/20/2007 1:56 PM | Jan Tenenberg
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON P1
WHERE EXISTS (
SELECT PERSON.PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M' and PERSON.PERSON_URN = P1.PERSON_URN
)
GROUP BY PERSON.PERSON_URN
9/20/2007 2:07 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Brilliant! Thank You Very Much!
9/20/2007 2:11 PM | Jan Tenenberg
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Thanks! I was banging my head trying to figure out why my NOT IN statement didnt work.

I still dont understand why the statement works if it is just IN though.
12/13/2007 1:54 AM | Mark
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Hi! I was thinking about this ...

Does a "SELECT A.Afield, B.AnotherField FROM TableA A JOIN TableB B on 1=1" can hurt a DB?
Performance and all!?
Is it better to do 2 selcets? anyway if one is null I can't proceed...

(Instead of doing 2 select in a row... one should be better no?)

TY :)
12/20/2007 11:03 PM | Gerard
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I have tab1 as:

Col1 Col2 Col3
1 2 Null

Then the Master as:

Col Name

1 One
2 Two
3 Three


I want the output as

Col1 Col2 Col3
One Two Null

Any help? I have been struggling since yesterday.

1/4/2008 5:29 PM | Munish
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Can I use two column in IN...

example:

SELECT t1.*
FROM t1
WHERE t1.id, t1.Title NOT IN (SELECT t1id, Somedata FROM t2)
1/25/2008 4:29 PM | Ashish
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

no. that's why you have exists.
1/25/2008 4:33 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Munish:
look for pivot or Cross tab on sqlteam
1/25/2008 4:34 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Thanks for your in-depth analysis.
I was specifically looking at understanding the difference between EXISTS and IN keywords w.r.t SQL Server.
This article helped me a lot.
3/25/2008 2:16 PM | Channappa Bangaloer
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Helpful to me and good article but...
It seems moot and contrived because would one ever allow NULLs in primary key or foreign key columns?
I can't imagine it.
5/16/2008 11:32 PM | retheredge
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

retheredge, you missed the point, the reason pk is null is the result of an outer join
6/24/2008 3:31 PM | Josh
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Great article
I was going completly loco, trying to understand why my "not in" does not work ...
6/26/2008 1:21 AM | Dimitry
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Josh, you're trying to compare apples and oranges....

retheredge is saying that this particular example is contrived as you aren't likely going to have a NULL value on any key value like an ID . The overall point about how NULL evaluates as neither "true" or "false" when compared to another value is, on the other hand, a valid one. However, rarely is it the case that tables are joined or otherwise linked using columns that could actually contain NULL values. It's *much* more common "join" on generic, non-NULL integer key (primary or foreign).

Let's take a subset of the data set given in the example:
TABLE t1
(id INT, title VARCHAR(20), someIntCol INT)
1, 'title 1', 5
2, 'title 2', 5
3, 'title 3', 5

TABLE t2
(id INT, t1Id INT, someData VARCHAR(20))
1, 1, 'data 1'
2, 1, 'data 2'
4, 3, 'data 4'
5, 3, 'data 5'
6, 3, 'data 6'


Given *this* data set the following queries give the same results:
SELECT t1.*
FROM t1
LEFT JOIN t2 ON t1.id = t2.t1Id
WHERE t2.id IS NULL

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t1id FROM t2)

SELECT t1.*
FROM t1
WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)

All 3 will return:
2, 'title 2', 5


To sum up, in a majority of data sets in the wild, all there queries will operate the same. Although using (INNER/LEFT/RIGHT/OUTER) JOIN is usually better with respect to shortest processing times.
7/25/2008 8:49 PM | merlinyoda
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Josh, also the "primary key" in question (t1.id) is *not* NULL because of an OUTER JOIN (the examples used LEFT OUTER JOIN a.k.a. LEFT JOIN, which is not the same as OUTER JOIN a.k.a. FULL OUTER JOIN). Some of the values of t1 are NULL because they are *set* as NULL in t1. Though, t1.id isn't much of a "primary key" anyway as there are 2 separate records in the example data set that have NULL values. t2 is a little more "acceptable" as far as data constraints t2.t1id would be a column acting like a foreign key and t2.id a primary key for t2.

If t2.* were added to the outer query select, then you will get NULL values for t2 columns when there aren't any "matching" records as specified by the ON clause (i.e. t1.id=t2.t1id evaluates to "true"). Additionally in the other queries, we aren't comparing t2 columns with the subquery results, we're comparing t1 columns to the subquery (which happens to be querying t2).
7/25/2008 9:17 PM | merlinyoda
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Hello,

I would like to add some precision as some person would believe that IN is not a good operator.

NULLs : NULL is not comparable to anything. Also, using null in any operation will result in a null value. EX :

SELECT 'TEST' + NULL will give NULL as result. The only way to test if a value is null is to use the NULL operator.


IN:

Returns true if a specified value matches any value in a subquery or a list. This is true. I must add though that IN acts as a INNER JOIN would for matching purpose. It will keep only Lines where both operands matches.

Exists:

Returns true if a subquery contains any rows. Yes this is VERY different from IN. In check if matching occurs, Exists check only if SUBQUERY returns a row.

Join:


There are 2 kinds of joins

INNER ONES :

Inner returns only matching lines.

SELECT * from table1 join table2 on table1.id = table2.id will act like SELECT * from table1, table2 where table1.id = table2.id


OUTER ONES (LEFT, RIGHT, FULL) :

Those joins are different

LEFT will keep all the records from the left table (select * from table1 left join table2 on table1.id = table2.id in this query table1) and ADD NULLS value to the columns of table2 so that each line of table1 has at least 1 matching lines.

it also translates into this

select * from t1,t2 where t1.id = t2.t1id
UNION ALL -- ALL with prevent union from doing a distinct
select *, NULL, NULL, NULL from t1 where t1.id is null or t1.id not in (select t2.t1id from t2 where t2.t1id is not null)

And you can guess it right is the other way around. It will keep all the lines of the RIGHT table on the join statement.
(select * from table1 left join table2 on table1.id = table2.id in this query table2),

FULL is the combination of both join. LEFT AND RIGHT meaning no lines are lost and are completed with matching nulls from the other table if no match is possible

it translates into this.

select * from t1,t2 where t1.id = t2.t1id
UNION ALL
select *, NULL, NULL, NULL from t1 where t1.id is null or t1.id not in (select t2.t1id from t2 where t2.t1id is not null)
UNION ALL
select null,null,null, * from t2 where t2.t1id is null or t2.t1id not in (select t1.id from t1 where t1.id is not null)

8/26/2008 10:18 PM | Boumxyz
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

tkx!!!
12/4/2008 8:30 PM | marcelo silva
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Thank you all for your comments!
12/31/2008 7:34 AM | Eltayeb
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Great article, Mladen and thanks for sharing!
1/2/2009 2:49 AM | mxj
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

good an excellant one
1/15/2009 11:01 PM | aravind
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

VERY GOOD
1/20/2009 6:22 AM | gokul
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Hehe, a very good article. Always useful, even if its almost two years old.

Now that I know the difference, I wonder which one is the fastest one.

If I understand the EXIST behavior, the sub-query processing will stop at the first "existing" that matches, making it the fastest one to use.

Am I right ?

Kube
1/28/2009 10:10 PM | Kube
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

yes you are. i alwsay prefer exists.
however the best bet of course is to look at the execution plan.
1/28/2009 10:52 PM | Mladen
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

:)))))))))))
2/27/2009 8:06 PM | JOIN
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

JOIN IS BETTER
3/23/2009 12:18 PM | wael
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I'm hoping that someone can help me. I've been banging my head against a wall for a while now... Here's what I'm trying to do:

I have a table (LAWSON_APVENMAST1) that contains vendor information. One of the fields (LAWSON_APVENMAST1.TAX_ID) has duplicate records (meaning that there are different vendors with the same Tax ID).

I have the following query that will give a list of the TAX_IDs that exist more than once in the table, but I cannot figure out how to return the Vendor Number (LAWSON_APVENMAST1.VENDOR) and Vendor Name (LAWSON_APVENMAST1.VENDOR_VNAME).

SELECT LAWSON_APVENMAST1.TAX_ID, Count(LAWSON_APVENMAST1.TAX_ID) AS CountOfTAX_ID
FROM LAWSON_APVENMAST1
GROUP BY LAWSON_APVENMAST1.TAX_ID
HAVING (((LAWSON_APVENMAST1.TAX_ID)<>" ") AND ((Count(LAWSON_APVENMAST1.TAX_ID))>1));

Any help is greatly appreciated!
Thanks,
Ken
4/14/2009 6:26 PM | Ken
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Ken -

SELECT T.TAX_ID, T.VENDOR, T.VENDOR_VNAME
FROM LAWSON_APVENMAST1 T
WHERE EXISTS (
select B.TAX_ID, count(B.TAX_ID) as CountofTAX_ID
from LAWSON_APVENMAST1 B
where T.TAX_ID = B.TAX_ID
group by B.TAX_ID
having (((B.TAX_ID)<>'') and ((count(B.TAX_ID))>1))
)
4/15/2009 10:09 PM | nami
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Awesome, this really helped me!!!
7/21/2009 11:34 PM | Steve
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

This one always gets me when i do a not in and wonder why it returns no results, thanks for the tip
10/7/2009 2:53 PM | Terry
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

thx, nice article, you made me pass an SQL exam
10/8/2009 3:14 PM | Serder
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

The only reason why "not exists ( select * ...)" works is the use of "*" in the select, as not all columns are null.
Try changing the "*" to t2.t1id; or adding "UNION ( select NULL,NULL,NULL)" to the end of the inner query.
Either one returns an empty set for the same reason that IN does: one cannot safely compare NULL.
10/16/2009 6:36 PM | Thais
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Great help!
3/25/2010 12:33 AM | OBender
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Cheers, very useful....
4/8/2010 5:21 PM | Len
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

'
7/29/2010 12:07 AM | ¡
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Thank for this article. It's very informative.
8/2/2010 4:10 AM | escalera
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Very useful.
9/28/2010 4:43 PM | newButSolidMind
Gravatar

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I have 2 questions related to exists

1. How does the exists work? Let's say I have ID(numeric), NAme(char) & salary(numeric) columns in customer and CustID(numeric) in T2. How does the sql engine know which columns from customer should be used to match with the resulted result from T2 if exists is used as follows
select customer.* from customer where exists (select 1 from T2 where T2.CustID=customer.ID)

2. Can we use exists and UNION together?
I have table customer as mentioned above. I have table T1 with column CustID(numeric) which contains IDs from customer < 10. There is another table T2 with column CID(numeric) which contains values > 10 AND < 15.
I want to get all the records from customer whose IDs present either of the tables T1 or T2.

To find union, I can use the following query.
SELECT t1.CustID FROM T1 t1
UNION
SELECT t2.CID FROM T2 t2

I thought the below query will return me all the records for the customers from the above result but it gave me all the records from the customer table. :(

select t3.* from customer t3 where exists (SELECT t1.CustID FROM T1 t1
UNION
SELECT t2.CID FROM T2 t2)

Please help me find the correct solution for this scenario.

Thanks.
9/28/2010 4:56 PM | newButSolidMind
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET