I want some Moore

Blog about stuff and things and stuff...
mostly about SQL server and .Net
posts - 161, comments - 1491, trackbacks - 33

My Links

SQLTeam.com Links

News

Hi! My name is 
Mladen Prajdić  I'm from Slovenia and I'm currently working as a .Net (C#) and SQL Server developer. I'm also a MCP and MCTS for SQL Server. 
Welcome to my blog.

Search this Blog
 

My Blog Feed via Email


Get your Google PageRank
Users Online: who's online

Article Categories

Archives

Post Categories

Cool software

Other Blogs

Other stuff

SQL stuff

SQL Server: JOIN vs IN vs EXISTS - the logical difference

 

There is a common misconception that IN behaves equaliy to EXISTS or JOIN in terms of returned results.

This is simply not true. To see why not, let's review what each statement does.

 

IN:

Returns true if a specified value matches any value in a subquery or a list.

Exists:

Returns true if a subquery contains any rows.

Join:

Joins 2 resultsets on the joining column.

 

If not read carefully it looks pretty same so far, doesn't it.

The difference comes when you take into account the "dreaded" THREE-VALUED LOGIC.

Let's review that also:

-- this is true
SELECT 1 WHERE 1 = 1

-- this is false
SELECT 1 WHERE 1 = 0

-- this is unknown - it is usually expected to be false, but that only shows 
-- misunderstanding of nulls. It's not false it's only treated as false in the filter
SELECT 1 WHERE 1 = NULL

-- this is also unknown - but logicaly it would seem it would be true
-- but unknown compared to unknown equals uknown and it is treated as false in the filter
SELECT 1 WHERE NULL = NULL

 

The where will return a row only if the condition evaluates to true which UNKNOWN isn't.

 

So let's demonstrate this with some code. I've commented it heavily so it can be self explaining.

I've also shown a parser bug in the IN Query. Well... it's a bug if you ask me. It shouldn't behave like that.

 

------------------------------------------------------------------
-- Prepare tables and data
------------------------------------------------------------------
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO

INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5

INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'

------------------------------------------------------------------
-- we want to get all data in t1 that has a child row in t2
------------------------------------------------------------------

-- join gives us more rows than we need, because it joins to every child row
SELECT    t1.* 
FROM    t1 
        JOIN t2 ON t1.id = t2.t1Id
-- distinct would solve that but it's not pretty nor efficient
SELECT    DISTINCT t1.* 
FROM    t1 
        JOIN t2 ON t1.id = t2.t1Id

-- now this is a weird part where someIntCol is a column in t1 
-- but the parser doesn't seem to mind that
SELECT    t1.* 
FROM    t1 
WHERE    t1.id IN (SELECT someIntCol FROM t2)

-- here in and exists both get correct results
SELECT    t1.* 
FROM    t1 
WHERE    t1.id IN (SELECT t1id FROM t2)

SELECT    t1.* 
FROM    t1 
WHERE    exists (SELECT * FROM t2 WHERE t1.id = t2.t1id)

------------------------------------------------------------------
-- we want to get all data in t1 that doesn't have a child row in t2
------------------------------------------------------------------

-- join gives us the correct result
SELECT    t1.* 
FROM    t1 
        LEFT JOIN t2 ON t1.id = t2.t1Id
WHERE    t2.id IS NULL

-- IN doesn't get correct results.
-- That's because of how IN treats NULLs and the Three-valued logic
-- NULL is treated as an unknown, so if there's a null in the t2.t1id 
-- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE.
-- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set. 
SELECT    t1.* 
FROM    t1 
WHERE    t1.id NOT IN (SELECT t1id FROM t2)

-- NOT EXISTS gets correct results
SELECT    t1.* 
FROM    t1 
WHERE    NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
GO

DROP TABLE t2
DROP TABLE t1

 

We can see that it's best to use EXISTS because it always behaves as the user would think it does.

Hope this demonstartes the logical difference between the 3 close yet so far apart functionalities of SQL Server

 

kick it on DotNetKicks.com

Print | posted on Friday, May 18, 2007 6:52 PM

Feedback

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Very nice article. A few of these have come across dotnetkicks in the past month or so (including one from me where I was ranting about the Unknown result from a programmer's standpoint). This is the nicest one so far with the best example.
5/18/2007 7:40 PM | mcgurk

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

thanx. glad you like it :)
5/18/2007 7:42 PM | Mladen

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Now for fun run this

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT someIntCol FROM t2)

notice that the someIntCol column does NOT exists in the t2 table


you can read more about that behaviour here: http://sqlservercode.blogspot.com/2007/04/you-should-never-use-in-in-sql-to-join.html
5/18/2007 8:08 PM | Denis the SQL Menace

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

nice Denis. Thanx for sharing.
5/18/2007 8:09 PM | Mladen

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Denis -- that's a good point and a thing to keep in mind when writing any correlated sub-query, with or without using IN(). That's why in general I prefer derived tables -- they tend to be neater and I like the way that each derived table is a complete, self-contained SELECT, and you never need to worry about that issue.
5/20/2007 5:58 PM | Jeff

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Excellent article - well explained
5/21/2007 5:44 AM | Andrew Dixon

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Denis's example is what you get when you do not properly reference columns.

The unknown column in the subquery does exist in (one of) the table(s) in the main query, so SQL Server can resolve the name, and no error is raised.

Note that the error would be raised if you added a reference the table before the column in the subquery (like you should):

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t2.someIntCol FROM t2)
7/23/2007 12:47 PM | Adriaan

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

nice topic.....
it helped me......
thanks
9/3/2007 7:34 AM | neelam

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

nice articles

i was confused and take anyone from these three commands

thank u very much

regards
pawan
Pune, India
9/6/2007 5:02 PM | Pawan

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I have the following query...

SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON
WHERE EXISTS (
SELECT PERSON.PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M'
)
GROUP BY PERSON.PERSON_URN

As you can see, I am selecting from the same table inside the subquery as I am in the overall query. This returns everybody in the table, whereas the subquery by itself returns about 50%. I do not understand why this is and would appreciate your help with this very much.

Please help!!!

Thanks Jan
9/20/2007 1:30 PM | Jan Tenenberg

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

so why don't you just do:
SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M'
GROUP BY PERSON.PERSON_URN
9/20/2007 1:50 PM | Mladen

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I thought you might say that...He He. I have developed an application that generates SQL on the fly and subqueries form part of this across many tables. The issue only really comes into it when the subquery table is the same as the outside table. Do you know what the problem is with my original statement and how I can get around it with maintaining the subquery?

Many Thanks,

Jan
9/20/2007 1:56 PM | Jan Tenenberg

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

SELECT MIN(PERSON.PERSON_URN) AS PERSON_URN
FROM PERSON P1
WHERE EXISTS (
SELECT PERSON.PERSON_URN
FROM PERSON
WHERE PERSON.GENDER ='M' and PERSON.PERSON_URN = P1.PERSON_URN
)
GROUP BY PERSON.PERSON_URN
9/20/2007 2:07 PM | Mladen

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Brilliant! Thank You Very Much!
9/20/2007 2:11 PM | Jan Tenenberg

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Thanks! I was banging my head trying to figure out why my NOT IN statement didnt work.

I still dont understand why the statement works if it is just IN though.
12/13/2007 1:54 AM | Mark

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Hi! I was thinking about this ...

Does a "SELECT A.Afield, B.AnotherField FROM TableA A JOIN TableB B on 1=1" can hurt a DB?
Performance and all!?
Is it better to do 2 selcets? anyway if one is null I can't proceed...

(Instead of doing 2 select in a row... one should be better no?)

TY :)
12/20/2007 11:03 PM | Gerard

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

I have tab1 as:

Col1 Col2 Col3
1 2 Null

Then the Master as:

Col Name

1 One
2 Two
3 Three


I want the output as

Col1 Col2 Col3
One Two Null

Any help? I have been struggling since yesterday.

1/4/2008 5:29 PM | Munish

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Can I use two column in IN...

example:

SELECT t1.*
FROM t1
WHERE t1.id, t1.Title NOT IN (SELECT t1id, Somedata FROM t2)
1/25/2008 4:29 PM | Ashish

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

no. that's why you have exists.
1/25/2008 4:33 PM | Mladen

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Munish:
look for pivot or Cross tab on sqlteam
1/25/2008 4:34 PM | Mladen

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Thanks for your in-depth analysis.
I was specifically looking at understanding the difference between EXISTS and IN keywords w.r.t SQL Server.
This article helped me a lot.
3/25/2008 2:16 PM | Channappa Bangaloer

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Helpful to me and good article but...
It seems moot and contrived because would one ever allow NULLs in primary key or foreign key columns?
I can't imagine it.
5/16/2008 11:32 PM | retheredge

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

retheredge, you missed the point, the reason pk is null is the result of an outer join
6/24/2008 3:31 PM | Josh

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Great article
I was going completly loco, trying to understand why my "not in" does not work ...
6/26/2008 1:21 AM | Dimitry

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Josh, you're trying to compare apples and oranges....

retheredge is saying that this particular example is contrived as you aren't likely going to have a NULL value on any key value like an ID . The overall point about how NULL evaluates as neither "true" or "false" when compared to another value is, on the other hand, a valid one. However, rarely is it the case that tables are joined or otherwise linked using columns that could actually contain NULL values. It's *much* more common "join" on generic, non-NULL integer key (primary or foreign).

Let's take a subset of the data set given in the example:
TABLE t1
(id INT, title VARCHAR(20), someIntCol INT)
1, 'title 1', 5
2, 'title 2', 5
3, 'title 3', 5

TABLE t2
(id INT, t1Id INT, someData VARCHAR(20))
1, 1, 'data 1'
2, 1, 'data 2'
4, 3, 'data 4'
5, 3, 'data 5'
6, 3, 'data 6'


Given *this* data set the following queries give the same results:
SELECT t1.*
FROM t1
LEFT JOIN t2 ON t1.id = t2.t1Id
WHERE t2.id IS NULL

SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t1id FROM t2)

SELECT t1.*
FROM t1
WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)

All 3 will return:
2, 'title 2', 5


To sum up, in a majority of data sets in the wild, all there queries will operate the same. Although using (INNER/LEFT/RIGHT/OUTER) JOIN is usually better with respect to shortest processing times.
7/25/2008 8:49 PM | merlinyoda

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Josh, also the "primary key" in question (t1.id) is *not* NULL because of an OUTER JOIN (the examples used LEFT OUTER JOIN a.k.a. LEFT JOIN, which is not the same as OUTER JOIN a.k.a. FULL OUTER JOIN). Some of the values of t1 are NULL because they are *set* as NULL in t1. Though, t1.id isn't much of a "primary key" anyway as there are 2 separate records in the example data set that have NULL values. t2 is a little more "acceptable" as far as data constraints t2.t1id would be a column acting like a foreign key and t2.id a primary key for t2.

If t2.* were added to the outer query select, then you will get NULL values for t2 columns when there aren't any "matching" records as specified by the ON clause (i.e. t1.id=t2.t1id evaluates to "true"). Additionally in the other queries, we aren't comparing t2 columns with the subquery results, we're comparing t1 columns to the subquery (which happens to be querying t2).
7/25/2008 9:17 PM | merlinyoda

# re: SQL Server: JOIN vs IN vs EXISTS - the logical difference

Hello,

I would like to add some precision as some person would believe that IN is not a good operator.

NULLs : NULL is not comparable to anything. Also, using null in any operation will result in a null value. EX :

SELECT 'TEST' + NULL will give NULL as result. The only way to test if a value is null is to use the NULL operator.


IN:

Returns true if a specified value matches any value in a subquery or a list. This is true. I must add though that IN acts as a INNER JOIN would for matching purpose. It will keep only Lines where both operands matches.

Exists:

Returns true if a subquery contains any rows. Yes this is VERY different from IN. In check if matching occurs, Exists check only if SUBQUERY returns a row.

Join:


There are 2 kinds of joins

INNER ONES :

Inner returns only matching lines.

SELECT * from table1 join table2 on table1.id = table2.id will act like SELECT * from table1, table2 where table1.id = table2.id


OUTER ONES (LEFT, RIGHT, FULL) :

Those joins are different

LEFT will keep all the records from the left table (select * from table1 left join table2 on table1.id = table2.id in this query table1) and ADD NULLS value to the columns of table2 so that each line of table1 has at least 1 matching lines.

it also translates into this

select * from t1,t2 where t1.id = t2.t1id
UNION ALL -- ALL with prevent union from doing a distinct
select *, NULL, NULL, NULL from t1 where t1.id is null or t1.id not in (select t2.t1id from t2 where t2.t1id is not null)

And you can guess it right is the other way around. It will keep all the lines of the RIGHT table on the join statement.
(select * from table1 left join table2 on table1.id = table2.id in this query table2),

FULL is the combination of both join. LEFT AND RIGHT meaning no lines are lost and are completed with matching nulls from the other table if no match is possible

it translates into this.

select * from t1,t2 where t1.id = t2.t1id
UNION ALL
select *, NULL, NULL, NULL from t1 where t1.id is null or t1.id not in (select t2.t1id from t2 where t2.t1id is not null)
UNION ALL
select null,null,null, * from t2 where t2.t1id is null or t2.t1id not in (select t1.id from t1 where t1.id is not null)

8/26/2008 10:18 PM | Boumxyz

Post Comment

Title  
Name  
Email
Url
Comment   
Please add 2 and 3 and type the answer here:

Powered by: