Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Criteria on Outer Joined Tables

As this nice SQLTeam article explains, when using an OUTER JOIN, you should put criteria on the outer table in the join condition, not in the WHERE clause.  However, I often see a "workaround" to avoid this simple and solid rule, which might seem to work but actually doesn't.  Since it is hard to explain why over and over in forum posts, I thought it might be helpful to address that here once and for all with an example.  Confused? Stick with me, I promise this will make sense eventually.

Suppose we have the following tables:

create table Depts  (deptID int, DeptName varchar(20))
create table Emps (empID int, EmpName varchar(20), deptID int, salary money)

insert into Depts (deptID, deptName)
select 1, 'Dept A' union all
select 2, 'Dept B' union all
select 3, 'Dept C'

insert into Emps (empID, EmpName, DeptID, salary)
select 1, 'Emp 1', 1, 50 union all
select 2, 'Emp 2', 1, 75 union all
select 3, 'Emp 3', 2, 60 union all
select 3, 'Emp 4', 2, 45

Notice the following:
  • We have 3 departments
  • Each employee is assigned to a department
  • There are no employees assigned to 'Dept C'
Now, if we want to select all departments along with any employees that happen to be in that department, we use an OUTER JOIN since an INNER JOIN will only return departments with matching employees:

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    Emps e on e.deptID = d.deptID

deptID      DeptName             empID       empName              salary

----------- -------------------- ----------- -------------------- ---------
1           Dept A               1           Emp 1                50.00
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00
2           Dept B               3           Emp 4                45.00
3           Dept C               NULL        NULL                 NULL

(5 row(s) affected)


Ok, so far so good.  Now, what if we wish to return only employees with a salary of >50?  If we add the criteria to our WHERE clause (WHERE e.salary > 50) this means that the last row in the above result -- 'Dept C' --  will not be returned, since the salary column is null:

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    Emps e on e.deptID = d.deptID
where
    e.salary > 50

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00

(2 row(s) affected)


Thus, as Mark's article explains quite well, you simply put that criteria directly into the JOIN condition and you get the results you need:

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    Emps e on e.deptID = d.deptID and e.salary > 50

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00
3           Dept C               NULL        NULL                 NULL

(3 row(s) affected)

However, you might be tempted to look at your data and think that to return these results, instead of putting the criteria in the JOIN, you can simply handle that NULL situation in your WHERE clause like this:

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    Emps e on e.deptID = d.deptID
where
    e.salary > 50 or e.empID is null

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00
3           Dept C               NULL        NULL                 NULL

(3 row(s) affected)


That works, right?  The NULL results are returned, and we only have employees with a salary above 50.  Case closed!  This would seem to indicate that we can indeed reference our outer tables in our WHERE clause, we just have to make sure that we deal with the NULL's that come back so that we don't filter them out.

However, that does not work!  Why not?  Well, right now we have no employees in Dept C at all, but what if we add one, with a salary below $50?  Let's try it:

insert into Emps (empID, EmpName, deptID, salary) values(4, 'Emp 5', 3, 35)

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    Emps e on e.deptID = d.deptID
where
    e.salary > 50 or e.empID is null

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00

(2 row(s) affected)

What happened here?  Where did Dept C go?  Suddenly, our SQL statement no longer returns the desired results.  Why not?  Well, like pretty much all programming bugs, it's because that silly computer did exactly what you told it to do.  Let's look at our data again, without any WHERE clause at all:

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    Emps e on e.deptID = d.deptID

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               1           Emp 1                50.00
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00
2           Dept B               3           Emp 4                45.00
3           Dept C               4           Emp 5                35.00

(5 row(s) affected)

Notice that we have no NULL values now, and there is at least 1 employee for each department.  Now, let's go through those results and manually apply our WHERE clause, row by row, removing all rows that don't meet the criteria:

where e.salary > 50 or e.empID is null

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               1           Emp 1                50.00
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00
2           Dept B               3           Emp 4                45.00
3           Dept C               4           Emp 5                35.00


So, you can see now that your WHERE clause simply was not correctly written!  What condition, then, it is applying on our data?  Well, remember that, when using OUTER JOINS, the expression "outertable.primarykeycolumn is null" means "there is no matching row in the outer table meeting the join criteria".  Thus, our WHERE clause really reads:

"return all departments and employees with a salary > 50, or in which there are no employees at all."

So, don't try to use this "trick" to avoid putting criteria in your outer joins.   Of course, this also applies if you use ISNULL, CASE or COALESCE to handle the nulls in your WHERE clause -- the same (wrong) results will be returned.

 The rule is very simple and very easy to remember:

Never reference OUTER JOIN-ed tables in your WHERE clause.

The exception is when you want to test if your outer join expression matched any rows at all; i.e.,  checking to see if the outer table's primary key column is NULL.  Otherwise, put any criteria on your outer tables within the join expression itself.

. . .

Side note: To apply criteria to your outer tables, in addition to the JOIN clause, you could also use a derived table or CTE, like this:

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
    (select * from Emps where salary > 50) e on e.deptID = d.deptID

deptID      DeptName             empID       empName              salary
----------- -------------------- ----------- -------------------- ---------------------
1           Dept A               2           Emp 2                75.00
2           Dept B               3           Emp 3                60.00
3           Dept C               NULL        NULL                 NULL

(3 row(s) affected)

...

Any questions or comments are welcome!  I hope this helps clear a few things up, and if you've ever used the "where ... or outertable.pk is null" technique in the past, you might want to double-check your code and your results, they may not be what you expect!

see also:

Print | posted on Monday, May 14, 2007 10:12 AM | Filed Under [ T-SQL Joins/Relations ]

Feedback

Gravatar

# re: Criteria on Outer Tables

Another great post. I wanted to thank you for your blog. You really post great, real-world SQL!!!
5/14/2007 12:00 PM | Chris Reeder
Gravatar

# re: Criteria on Outer Tables

Why bother with the subquery in this case?

select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join Emps e on e.deptID = d.deptID
and e.salary > 50
5/14/2007 2:43 PM | matthew
Gravatar

# re: Criteria on Outer Tables

@ Mathew --

Just a word of advice: before commenting on an article, it might be helpful to actually *read* it ! :)

5/14/2007 2:59 PM | Jeff
Gravatar

# re: Criteria on Outer Tables

This would be a CTE version (SQL 2005)

WITH EmpsCTE(deptID,empID, empName, salary) AS
(
SELECT deptID,empID, empName, salary
FROM Emps
WHERE salary > 50
)

SELECT d.deptID, d.DeptName, e.empID, e.empName, e.salary
FROM Depts d
LEFT OUTER JOIN EmpsCTE AS e ON e.deptID = d.deptID;
5/14/2007 4:52 PM | Denis the SQL Menace
Gravatar

# re: Criteria on Outer Tables

Here is another way in 2005 by using OUTER APPLY


SELECT d.deptID, d.DeptName, e.empID, e.empName, e.salary
FROM Depts d
OUTER APPLY
(SELECT * FROM Emps e WHERE salary > 50 and e.deptID = d.deptID) e;


5/14/2007 6:38 PM | Denis the SQL Menace
Gravatar

# re: Criteria on Outer Tables

Denis, I have a feeling that, with an APPLY, SQL Server will always get an execution plan that uses nested loop join. Certainly it will do so in the circumstances I've tried, where the equivalent OUTER JOIN query would use hash or merge.
5/15/2007 4:09 AM | Arnold Fribble
Gravatar

# re: Criteria on Outer Tables

Arnold, yes you would use a OUTER JOIN, the APPLY is just silly to use in this case
I just included it for the purpose of demonstrating that it can be used
5/15/2007 6:35 AM | Denis the SQL Menace
Gravatar

# re: Criteria on Outer Tables

Denis -- thanks for posting the code for the alternative techniques! It's good to see that at least someone is reading the text!

I will have to play with APPLY a little more to find out situations in which it is a viable technique to use. Even if it might not be the way to go in this case, it is still interesting to see how that approach works in comparison to the others.
5/15/2007 8:34 AM | Jeff
Gravatar

# re: Criteria on Outer Tables

Wow, this is exactly what I needed today. In fact, I almost searched for this information a few days ago but this is such a recent post, I wouldn't have found it. Time is everything.

My task was to join the same table 5 times looking for a different field each time.

It's cool to see the alternatives as well, kudos to those who've contributed.

thanks for the help!
5/18/2007 4:21 PM | Camey Combs
Gravatar

# re: Criteria on Outer Tables

Nice and helpful post.

Thanks!
6/7/2007 7:33 AM | Alopix
Gravatar

# re: Criteria on Outer Tables

Jeff,

Thanks for the article and all the other posts. Very informative.

In these situations I would probably use the derived table method. The reason is so that SQL is doing fewer joins since I am weeding out the rows from the emp table that dont fit before joining.
6/10/2007 1:23 AM | Richard
Gravatar

# re: Criteria on Outer Tables

Nice post, certainly tagged for bookmark. :) Thank you!
7/31/2007 2:44 AM | Merly
Gravatar

# re: Criteria on Outer Joined Tables

Hi Jeff,

thank you very much for the explanation! After quite a few hours struggling with an outer join query on my database I finally found your article and three minutes later it worked. I read other explanations and examples before, but yours was the only one which dealt with criteria in the joined table. Tanks again.
11/29/2007 9:39 AM | Maren v. Borstel
Gravatar

# re: Criteria on Outer Joined Tables

Can we join tables with column different names ?
10/29/2009 4:28 PM | red
Gravatar

# re: Criteria on Outer Joined Tables

Hi Jeff

Thanks, Great Post. Very informative. :-)
Regards,
Shally
5/28/2010 5:57 AM | shally
Gravatar

# re: Criteria on Outer Joined Tables

Thank you for making this important point Jeff; this is the first time I have come across this. However, I have a question about terminology: In an OUTER JOIN, is the table whose rows are all 'preserved' known as the 'outer table' or the 'inner table'?

In this post, you are talking about "criteria on the outer table" - in particular the criterion 'Emps.salary > 50'. This suggests you consider Emps to be the outer table, and presumably Depts is the inner table. I believe Depts is the table whose rows are all preserved in the OUTER JOIN, so I believe you'd consider the answer to my question to be 'inner table'.

However, some technical pages at Sybase and IBM seem to suggest that they consider the answer to be 'outer table':
manuals.sybase.com/.../12016;pt=11971
publib.boulder.ibm.com/.../index.jsp

And yet here is someone who agrees with you:
http://aam.ugpl.de/?q=en/node/1324

Perhaps whoever invented the OUTER JOIN didn't specify which of the joined tables was the outer table, so perhaps there is no 'official' answer?

Regards,
Peter
9/1/2010 3:22 PM | Peter Ford
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET