Criteria on Outer Joined Tables
As this nice SQLTeam article explains, when using an OUTER JOIN, you should put criteria on the outer table in the join condition, not in the WHERE clause. However, I often see a "workaround" to avoid this simple and solid rule, which might seem to work but actually doesn't. Since it is hard to explain why over and over in forum posts, I thought it might be helpful to address that here once and for all with an example. Confused? Stick with me, I promise this will make sense eventually.
Suppose we have the following tables:
create table Depts (deptID int, DeptName varchar(20))
create table Emps (empID int, EmpName varchar(20), deptID int, salary money)
insert into Depts (deptID, deptName)
select 1, 'Dept A' union all
select 2, 'Dept B' union all
select 3, 'Dept C'
insert into Emps (empID, EmpName, DeptID, salary)
select 1, 'Emp 1', 1, 50 union all
select 2, 'Emp 2', 1, 75 union all
select 3, 'Emp 3', 2, 60 union all
select 3, 'Emp 4', 2, 45
create table Emps (empID int, EmpName varchar(20), deptID int, salary money)
insert into Depts (deptID, deptName)
select 1, 'Dept A' union all
select 2, 'Dept B' union all
select 3, 'Dept C'
insert into Emps (empID, EmpName, DeptID, salary)
select 1, 'Emp 1', 1, 50 union all
select 2, 'Emp 2', 1, 75 union all
select 3, 'Emp 3', 2, 60 union all
select 3, 'Emp 4', 2, 45
Notice the following:
- We have 3 departments
- Each employee is assigned to a department
- There are no employees assigned to 'Dept C'
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------
1 Dept A 1 Emp 1 50.00
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
2 Dept B 3 Emp 4 45.00
3 Dept C NULL NULL NULL
(5 row(s) affected)
from Depts d
left outer join
Emps e on e.deptID = d.deptID
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------
1 Dept A 1 Emp 1 50.00
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
2 Dept B 3 Emp 4 45.00
3 Dept C NULL NULL NULL
(5 row(s) affected)
Ok, so far so good. Now, what if we wish to return only employees with a salary of >50? If we add the criteria to our WHERE clause (WHERE e.salary > 50) this means that the last row in the above result -- 'Dept C' -- will not be returned, since the salary column is null:
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID
where
e.salary > 50
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
(2 row(s) affected)
from Depts d
left outer join
Emps e on e.deptID = d.deptID
where
e.salary > 50
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
(2 row(s) affected)
Thus, as Mark's article explains quite well, you simply put that criteria directly into the JOIN condition and you get the results you need:
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID and e.salary > 50
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
3 Dept C NULL NULL NULL
(3 row(s) affected)
from Depts d
left outer join
Emps e on e.deptID = d.deptID and e.salary > 50
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
3 Dept C NULL NULL NULL
(3 row(s) affected)
However, you might be tempted to look at your data and think that to return these results, instead of putting the criteria in the JOIN, you can simply handle that NULL situation in your WHERE clause like this:
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID
where
e.salary > 50 or e.empID is null
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
3 Dept C NULL NULL NULL
(3 row(s) affected)
from Depts d
left outer join
Emps e on e.deptID = d.deptID
where
e.salary > 50 or e.empID is null
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
3 Dept C NULL NULL NULL
(3 row(s) affected)
That works, right? The NULL results are returned, and we only have employees with a salary above 50. Case closed! This would seem to indicate that we can indeed reference our outer tables in our WHERE clause, we just have to make sure that we deal with the NULL's that come back so that we don't filter them out.
However, that does not work! Why not? Well, right now we have no employees in Dept C at all, but what if we add one, with a salary below $50? Let's try it:
insert into Emps (empID, EmpName, deptID, salary) values(4, 'Emp 5', 3, 35)
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID
where
e.salary > 50 or e.empID is null
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
(2 row(s) affected)
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID
where
e.salary > 50 or e.empID is null
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
(2 row(s) affected)
What happened here? Where did Dept C go? Suddenly, our SQL statement no longer returns the desired results. Why not? Well, like pretty much all programming bugs, it's because that silly computer did exactly what you told it to do. Let's look at our data again, without any WHERE clause at all:
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
Emps e on e.deptID = d.deptID
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 1 Emp 1 50.00
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
2 Dept B 3 Emp 4 45.00
3 Dept C 4 Emp 5 35.00
(5 row(s) affected)
from Depts d
left outer join
Emps e on e.deptID = d.deptID
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 1 Emp 1 50.00
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
2 Dept B 3 Emp 4 45.00
3 Dept C 4 Emp 5 35.00
(5 row(s) affected)
Notice that we have no NULL values now, and there is at least 1 employee for each department. Now, let's go through those results and manually apply our WHERE clause, row by row, removing all rows that don't meet the criteria:
where e.salary > 50 or e.empID is null
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 1 Emp 1 50.00
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
2 Dept B 3 Emp 4 45.00
3 Dept C 4 Emp 5 35.00
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 1 Emp 1 50.00
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
2 Dept B 3 Emp 4 45.00
3 Dept C 4 Emp 5 35.00
So, you can see now that your WHERE clause simply was not correctly written! What condition, then, it is applying on our data? Well, remember that, when using OUTER JOINS, the expression "outertable.primarykeycolumn is null" means "there is no matching row in the outer table meeting the join criteria". Thus, our WHERE clause really reads:
"return all departments and employees with a salary > 50, or in which there are no employees at all."
So, don't try to use this "trick" to avoid putting criteria in your outer joins. Of course, this also applies if you use ISNULL, CASE or COALESCE to handle the nulls in your WHERE clause -- the same (wrong) results will be returned.
The rule is very simple and very easy to remember:
Never reference OUTER JOIN-ed tables in your WHERE clause.
The exception is when you want to test if your outer join expression matched any rows at all; i.e., checking to see if the outer table's primary key column is NULL. Otherwise, put any criteria on your outer tables within the join expression itself.
. . .
Side note: To apply criteria to your outer tables, in addition to the JOIN clause, you could also use a derived table or CTE, like this:
select d.deptID, d.DeptName, e.empID, e.empName, e.salary
from Depts d
left outer join
(select * from Emps where salary > 50) e on e.deptID = d.deptID
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
3 Dept C NULL NULL NULL
(3 row(s) affected)
from Depts d
left outer join
(select * from Emps where salary > 50) e on e.deptID = d.deptID
deptID DeptName empID empName salary
----------- -------------------- ----------- -------------------- ---------------------
1 Dept A 2 Emp 2 75.00
2 Dept B 3 Emp 3 60.00
3 Dept C NULL NULL NULL
(3 row(s) affected)
...
Any questions or comments are welcome! I hope this helps clear a few things up, and if you've ever used the "where ... or outertable.pk is null" technique in the past, you might want to double-check your code and your results, they may not be what you expect!
see also:
- Taking a look at CROSS APPLY
- The "Nested WHERE-IN" SQL Anti-Pattern
- Using GROUP BY to avoid self-joins
- Criteria on Outer Joined Tables
- Better Alternatives to a FULL OUTER JOIN
- Conditional Joins in SQL Server
- How to JOIN Multiple Transactional Tables in SQL
- The power of the Cross Join
Legacy Comments
Chris Reeder
2007-05-14 |
re: Criteria on Outer Tables Another great post. I wanted to thank you for your blog. You really post great, real-world SQL!!! |
matthew
2007-05-14 |
re: Criteria on Outer Tables Why bother with the subquery in this case? select d.deptID, d.DeptName, e.empID, e.empName, e.salary from Depts d left outer join Emps e on e.deptID = d.deptID and e.salary > 50 |
Jeff
2007-05-14 |
re: Criteria on Outer Tables @ Mathew -- Just a word of advice: before commenting on an article, it might be helpful to actually *read* it ! :) |
Denis the SQL Menace
2007-05-14 |
re: Criteria on Outer Tables This would be a CTE version (SQL 2005) WITH EmpsCTE(deptID,empID, empName, salary) AS ( SELECT deptID,empID, empName, salary FROM Emps WHERE salary > 50 ) SELECT d.deptID, d.DeptName, e.empID, e.empName, e.salary FROM Depts d LEFT OUTER JOIN EmpsCTE AS e ON e.deptID = d.deptID; |
Denis the SQL Menace
2007-05-14 |
re: Criteria on Outer Tables Here is another way in 2005 by using OUTER APPLY SELECT d.deptID, d.DeptName, e.empID, e.empName, e.salary FROM Depts d OUTER APPLY (SELECT * FROM Emps e WHERE salary > 50 and e.deptID = d.deptID) e; |
Arnold Fribble
2007-05-15 |
re: Criteria on Outer Tables Denis, I have a feeling that, with an APPLY, SQL Server will always get an execution plan that uses nested loop join. Certainly it will do so in the circumstances I've tried, where the equivalent OUTER JOIN query would use hash or merge. |
Denis the SQL Menace
2007-05-15 |
re: Criteria on Outer Tables Arnold, yes you would use a OUTER JOIN, the APPLY is just silly to use in this case I just included it for the purpose of demonstrating that it can be used |
Jeff
2007-05-15 |
re: Criteria on Outer Tables Denis -- thanks for posting the code for the alternative techniques! It's good to see that at least someone is reading the text! I will have to play with APPLY a little more to find out situations in which it is a viable technique to use. Even if it might not be the way to go in this case, it is still interesting to see how that approach works in comparison to the others. |
Camey Combs
2007-05-18 |
re: Criteria on Outer Tables Wow, this is exactly what I needed today. In fact, I almost searched for this information a few days ago but this is such a recent post, I wouldn't have found it. Time is everything. My task was to join the same table 5 times looking for a different field each time. It's cool to see the alternatives as well, kudos to those who've contributed. thanks for the help! |
Alopix
2007-06-07 |
re: Criteria on Outer Tables Nice and helpful post. Thanks! |
Richard
2007-06-10 |
re: Criteria on Outer Tables Jeff, Thanks for the article and all the other posts. Very informative. In these situations I would probably use the derived table method. The reason is so that SQL is doing fewer joins since I am weeding out the rows from the emp table that dont fit before joining. |
Merly
2007-07-31 |
re: Criteria on Outer Tables Nice post, certainly tagged for bookmark. :) Thank you! |
Maren v. Borstel
2007-11-29 |
re: Criteria on Outer Joined Tables Hi Jeff, thank you very much for the explanation! After quite a few hours struggling with an outer join query on my database I finally found your article and three minutes later it worked. I read other explanations and examples before, but yours was the only one which dealt with criteria in the joined table. Tanks again. |
red
2009-10-29 |
re: Criteria on Outer Joined Tables Can we join tables with column different names ? |
shally
2010-05-28 |
re: Criteria on Outer Joined Tables Hi Jeff Thanks, Great Post. Very informative. :-) Regards, Shally |
Peter Ford
2010-09-01 |
re: Criteria on Outer Joined Tables Thank you for making this important point Jeff; this is the first time I have come across this. However, I have a question about terminology: In an OUTER JOIN, is the table whose rows are all 'preserved' known as the 'outer table' or the 'inner table'? In this post, you are talking about "criteria on the outer table" - in particular the criterion 'Emps.salary > 50'. This suggests you consider Emps to be the outer table, and presumably Depts is the inner table. I believe Depts is the table whose rows are all preserved in the OUTER JOIN, so I believe you'd consider the answer to my question to be 'inner table'. However, some technical pages at Sybase and IBM seem to suggest that they consider the answer to be 'outer table': manuals.sybase.com/.../12016;pt=11971 publib.boulder.ibm.com/.../index.jsp And yet here is someone who agrees with you: http://aam.ugpl.de/?q=en/node/1324 Perhaps whoever invented the OUTER JOIN didn't specify which of the joined tables was the outer table, so perhaps there is no 'official' answer? Regards, Peter |