Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

Be Careful When Mixing INNER and OUTER Joins

I had previously written about the danger of Criteria on Outer Joins, but recently another situation popped up that occasionally causes confusion with OUTER JOINS that I thought I might address.  The issue is when you have multiple tables joined in a single SELECT, and you mix OUTER and INNER JOINS together.  The end result doesn't always seem to "work", and it can be tricky to understand exactly why and how to fix it without incurring additional unintended side effects.

Consider the following schema and sample data:

create table People
(
    PersonID int identity primary key,
    PersonName varchar(20)
)

create table PetTypes
(
    PetTypeID int identity primary key,
    PetType varchar(10)
)

create table Pets
(
    PetID int identity primary key,
    PetTypeID int references PetTypes(PetTypeID) not null,
    PetName varchar(10),
    OwnerID int references People(PersonID) not null
)

insert into People (PersonName)
select 'Fred Flintstone' union all
select 'Barney Rubble' union all
select 'George Jetson'

insert into PetTypes (PetType)
select 'Dinosaur' union all
select 'Hopparoo'

insert into Pets (PetTypeID, PetName, OwnerID)
select 1,'Dino',1 union all
select 2,'Hoppy',2

(note: I actually don't remember Hoppy at all from the cartoon, but I read about him here)

Notice that all pets must have an owner, and all pets must have a PetType that indicates what they are.

If we wish to return all People and their pets, if any, we would use an OUTER JOIN like this:

select People.PersonName, Pets.PetName
from People
left outer join Pets on Pets.OwnerID = People.PersonID

PersonName           PetName   
-------------------- ----------
Fred Flintstone      Dino
Barney Rubble        Hoppy
George Jetson        NULL

(3 row(s) affected)

We use the OUTER JOIN so that we still return George Jetson, even though he has no Pet.  Since there is no Pet for him, NULLS are returned for all columns in the Pets table for that row.  So far so good, this is exactly what we want, things are working great.

Now, suppose we wish to indicate the PetType as well for each Pet.   All Pets must have a PetType since the column is not nullable, so it appears that we can simply add an INNER JOIN between Pets and PetTypes to our previous SELECT to include this information.  Let's give it a shot:

select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID

PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Barney Rubble        Hoppy      Hopparoo

(2 row(s) affected)

Where did George Jetson go?   We know that we can do an INNER JOIN between Pets and PetTypes since all Pets will always have a PetType match, and we did an OUTER JOIN from People to Pets so ensure that all People are returned regardless of having a pet or not.  On paper, then, it might seem that something isn't working correctly since the code seems solid.

Well, we need to look a little more closely.  If we go back to our original OUTER JOIN (that included George in the results) and add the PetTypeID column from the Pets table to it, we get:

select People.PersonName, Pets.PetName, Pets.PetTypeID
from People
left outer join Pets on Pets.OwnerID = People.PersonID

PersonName           PetName    PetTypeID  
-------------------- ---------- -----------
Fred Flintstone      Dino       1
Barney Rubble        Hoppy      2
George Jetson        NULL       NULL

(3 row(s) affected)

Notice that the PetTypeID column is NULL for George.  SQL Server processes the joins in the order you specify, so it first gets the results from the OUTER JOIN between People and Pets, and then uses those results to INNER JOIN to the PetTypes table.  The NULL PetTypeID returned for George Jetson doesn't match any rows in the PetTypes table, and the INNER JOIN requires a match, so George is not returned in the results.

So how do we fix this?  One approach commonly used, which should really be avoided, is to simply change the INNER JOIN between Pets and PetTypes to a LEFT OUTER JOIN as well:

select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
left outer join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID

PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Barney Rubble        Hoppy      Hopparoo
George Jetson        NULL       NULL

(3 row(s) affected)

This appears to do the trick.  However, it is not the equivalent of our original intentions!  We initially wrote an INNER JOIN between Pets and PetTypes, since we do not want to return any Pets that do not have a corresponding PetType. In this case, it is not an issue because PetTypeID cannot be NULL for a Pet, but what if it could be?   The results will not be logically the same if we switch that INNER JOIN to an OUTER JOIN.

For example, let's create a new version of the Pets table, that allows for NULL pet types:

drop table Pets
go
create table Pets
(
    PetID int identity primary key,
    PetTypeID int references PetTypes(PetTypeID),
    PetName varchar(10),
    OwnerID int references People(PersonID) not null
)

insert into Pets (PetTypeID, PetName, OwnerID)
select 1,'Dino',1 union all
select 2,'Hoppy',2 union all
select null,'Baby Puss',1

Notice now that we've included Fred's cat, Baby Puss,  but we did not assign him a PetType.

Let's now re-run the original OUTER JOIN/INNER JOIN statement that excluded George:

select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID

PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Barney Rubble        Hoppy      Hopparoo

(2 row(s) affected)

The same results are returned; George is excluded for reasons we discovered (and are trying to fix), and Baby Puss is missing because we explicitly asked for an INNER JOIN between Pets and PetTypes.  The only issue we should have, then, with the previous SELECT is that George is missing and we'd like to see him -- Baby Puss should be excluded by design.  (If that isn't clear, let me know) So, if we switch the second join to an OUTER JOIN in an attempt to fix things to get George returned, we now get:

select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
left outer join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID

PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Fred Flintstone      Baby Puss  NULL
Barney Rubble        Hoppy      Hopparoo
George Jetson        NULL       NULL

(4 row(s) affected)

Notice that suddenly Baby Puss appears!  That is because we changed our join from Pets to PetTypes from an INNER to an OUTER.  So, just changing INNER JOINS to OUTER JOINS to "fix" this problem may have unintended side effects, and your query is no longer logically the same!  It may work in most cases, or in all cases depending on your constraints and relations, but it is important to understand that logically you have completely changed what you are asking for.

The best solution is to use a derived table to encapsulate the INNER JOIN between Pets and PetTypes.  Then, we simply select FROM the People table and OUTER JOIN to the derived table.  This returns the results we are looking for:

select
  People.PersonName, Pets.PetName, Pets.PetType
from
  People
left outer join
(
  select Pets.ownerID, Pets.PetName, PetTypes.PetType
  from Pets
  inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
)
  Pets
on
  Pets.OwnerID = People.PersonID
 
PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Barney Rubble        Hoppy      Hopparoo
George Jetson        NULL       NULL

(3 row(s) affected)

That's more like it!  Note that we could also use a Common Table Expression if you are using SQL 2005.

An alternative option, which also works correctly, is to nest your join expression like this:

select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join
  (Pets
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID)
on Pets.OwnerID = People.PersonID


That returns the correct results and is logically accurate, but I find it can be confusing to read and a little more difficult to maintain.  In general, I have found that most database programmers tend to avoid nested joins expressions.  It's up to you to decide which option you'd prefer to use.

So, be careful when mixing INNER and OUTER joins; it can be tricky to hunt down unintended results and side effects.  If you use Derived Tables (or nested joins) the way you'd use parenthesis in a math or boolean equation to express your logical order of precedence, you can make your code cleaner and more readable and you can ensure that you get back the exact results you intended.

see also:

Legacy Comments


Mladen
2007-10-11
re: Be Careful When Mixing INNER and OUTER Joins
why on earth is this so hard for people to understand ??
i don't get it!
It's really basic set theory.

Jeff
2007-10-11
re: Be Careful When Mixing INNER and OUTER Joins
I can see how beginners might get confused, especially if they use the thought process I outlined in the article. It seems like it might make sense at first, until you think about it.

the big issue is just changing that second INNER JOIN to an OUTER JOIN which "seems" like it "fixes" things, but it really completely changes the logical definition of what you are returning.


Astro
2007-10-11
re: Be Careful When Mixing INNER and OUTER Joins
What about me?

Jeff
2007-10-11
re: Be Careful When Mixing INNER and OUTER Joins
Sorry Astro !

Piers
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
> why on earth is this so hard for people to understand ??
> i don't get it!
> It's really basic set theory.

Fair enough.
However I do sometimes find myself filtering out results with a clause on an outer join. It's very easy to do and normally takes a few seconds before I slap my forehead and say Doh!.
So if I as an experienced developer occasionally trip up, I can see how it might initially be a concept that might not even occur to beginners.

Mladen
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
yes but you slap yourselft and correct the thing because you know WHY it's wrong.
unlike a ton of other "db developers" that have no clue about set theory happily writing up all those "complex" left joins.
see.. now you got me ranting :)))

Jeff
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
>> yes but you slap yourselft and correct the thing because you know WHY it's wrong.
unlike a ton of other "db developers" that have no clue about set theory happily writing up all those "complex" left joins.

That's why I wrote this post! To hopefully help explain it...


DaRage
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
I think a better techique is to always put inner joins followed by the outer joins.

Jeff
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
DaRage -- to do that, you'd make a mess of this example and up needing a RIGHT OUTER JOIN at the end. It would not read logically at all and be much harder to maintain and understand ... Overall, not a good idea, I strongly recommend against that.

DaRage
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
Why using right outer join not logically read exactly?

DaRage
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
I'm sorry left outer join should still work

Jeff
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
DaRage -- Ok, how would you write it the example without a RIGHT OUTER JOIN where you simply express the INNER JOIN first? Mind posting the actual code?

DaRage
2007-10-12
re: Be Careful When Mixing INNER and OUTER Joins
sure when i get home

DaRage
2007-10-13
re: Be Careful When Mixing INNER and OUTER Joins
You're right Jeff, left outer join doens't work but i still don't see why a right outer join is a bad idea.

I think it's a good idea because by following a simple rule of thumb which is puttin all you "mandantory" table first in the from cluase using inner joins and then follow them by the "optional" tables using right outer join. I think this rule is easier to follow and always achieves the intended results.

Jeff
2007-10-13
re: Be Careful When Mixing INNER and OUTER Joins
DaRage -- that's exactly why RIGHT OUTER JOINS do not make sense and you should avoid them.

The primary, driving table is People, and the outer table is Pets. Thus, we select FROM People and outer join TO pets. Which is People LEFT OUTER JOIN Pets. To write it with a RIGHT OUTER JOIN, we are selecting FROM Pets and outer joining to People -- which is completely backwards and not logically what we want to do.

DaRage
2007-10-13
re: Be Careful When Mixing INNER and OUTER Joins
Jeff, when the query involves 7 or 8 tables it's hard to decide which ones driving and which are not. What i'm saying is that to build the query based on the foriegn key relationship between the tables. If it's a non-nullabel (mandatory) then use inner join and if it's nullable (optional) then use a right outer join and then put the inner joins before the outer joins and that's it.

I'll use the query above to explain the concept.

1. Decide which tables have the information being retrived: People, Pets, PetTyps.
2. Decide what kind of joins to use based on the foriegn key relationship:
a. Poeple and Pets: nullable (Optional) then use RIGHT OUTER JOIN.
b. Pets and PetTypes: non-nullable (Mandatory) then use INNER JOIN.
3. Place the inner join before the right outer join in the from clause.

I think this is a very straight forward way to build the query and much easier than the one you suggested, no?



Jeff
2007-10-14
re: Be Careful When Mixing INNER and OUTER Joins
DaRage --

Again, with 7 or 8 tables involved, it is even MORE important to use derived tables to keep things clean and organized. RIGHT OUTER JOINS are simply not necessary and do not translate logically into what you want to return.

Consider the english sentence:

"return all people and the pets they have, if any"

Which makes more sense:

select FROM people and OUTER JOIN TO Pets

or

select FROM Pets OUTER JOIN TO People?

We are not selecting all pets and returning any matching people; we are doing it the other way around. The "driving" source in our select is People; the auxiallary data to return, if it matches and exists, is Pets. Thus, we select from people and join to pets. It makes no sense to select Pets and then join to people.

DaRage
2007-10-15
re: Be Careful When Mixing INNER and OUTER Joins
2 problems:

1. Applying english sentence structure to solve computational problems is bad approach. Look at LINQ (.Net new query language) where the from caluse comes before the select clause, what do you do then? also, what about other languages where the sentence structure is totatly different? Remember what you are falsly refering to as logic above in nothing but mere sentence structure.

2. Why apply logical analysis when a simple rule of thumb exists?

Jeff
2007-10-15
re: Be Careful When Mixing INNER and OUTER Joins
DaRage -- this is probably a lost cause, but I will try one more time. Let's suppose we also want to return Car and CarTypes in our existing select, same rules as Pets (this would potentially produce a cross join in the results, but ignore that for now).

Here is how I suggest that you write it:

select ...
from People
left outer join (Pets inner join Pet Types)
left outer join (Cars inner join Car Types)

(that is just pseudo-code, of course). What, exactly, is wrong with the above? Is it easy or difficult to read at a glance in your opinion? Why would you suggest to improve it? Does it not logically and accurately state exactly what we want, and doesn't it work perfectly? Isn't it easy to add further data to this SQL statement (say, Homes or Jobs or other related tables) without having the re-arrange your join clause to enforce your "rule"?

What you suggest is this:

select ...
from Pets
inner join PetTypes
inner join Cars
inner join CarTypes
right outer join People

Does that look clean and logical to you? Why do you feel that it is "better" than what I suggested? Why are we selecting FROM the Pets table and joining (eventually!) to the People table? We want ALL people and SOME pets, not ALL pets and SOME people! Do you honestly feel that this makes more sense or is easy to read and work with than what I simply suggested?

Now, let's add a INNER JOIN from People to a "Show" table that returns the show they are on. (Suppose the People table has a ShowID column) First, let's add it to my select:

select ...
from People
inner join Show on People.ShowID = Shows.ShowID
left outer join (select .. from Pets inner join Pet Types on Pets.PetTypeID = PetTypes.PetTypeID) P on P.OwnerID = People.PersonID
left outer join (select ... from Cars inner join CarTypes on Cars.CarTypeID = CarTypes.CarTypeID) C on C.OwnerID = People.PersonID

Took about 3 seconds to add in there, it all still reads logically and makes perfect sense. (indenting helps, which I cannot do here, unfortunately)

next, let's add it to yours, following your "rule" .... well, actually, I'll let you try it. I couldn't figure it out... Let me know how it goes, please post your code ... looking forward to it!

If I haven't convinced you yet, then I suppose it is not going to happen. I just hope I never find myself in a situation where I inherit any of your code covered in RIGHT OUTER JOINS! :)

Mike
2007-10-23
re: Be Careful When Mixing INNER and OUTER Joins
Hi Jeff,

Outer joins operate left to right normally so when they encounter an inner join it affects the entire working set. Outer joins are not usually affected by parenthesis. The primary influence for altering outer join execution is nesting them driven by placement of their ON clause. Your example that used parenthesis was not affected by the parenthesis it was the nesting of the ON clause which also happens automatically when views are expanded. In A left join B left join C ON B.b=C.c ON A.a=B.b the join of BC is performed first since its ON clause was encountered first. By the way this nesting is also supported by multiple working sets as needed. In this example the BC join causes a new working set to isolate it from affecting A. Pretty powerful processing.

/Mike

Jeff
2007-10-23
re: Be Careful When Mixing INNER and OUTER Joins
Mike -- I updated my post since it did seem to imply that the parens were altering the join order, not the on clause, thanks for pointing that out.

I changed:

>>express the join order using parenthesis like this:

to

>>nest your join expression like this:

Thanks for your feedback!

Eric Isaacs
2007-11-01
re: Be Careful When Mixing INNER and OUTER Joins
Jeff, I thought your post had very good information. I too HATE right (outer) joins and would love to find a website what would take in a SQL statement and remove all right joins, converting them to LEFT JOINs. If anyone has seen such a tool, please let me know. I'm thinking about writing one myself.

I also wanted to point out that the final result set that excluded fred's other pet could be accomplished without the nested join by making the ON clause more explicit between person and pets...

SELECT
People.PersonName,
Pets.PetName,
PetTypes.PetType
FROM
People
LEFT JOIN Pets ON (Pets.OwnerID = People.PersonID AND Pets.PetTypeID IS NOT NULL)
LEFT JOIN PetTypes ON Pets.PetTypeID = PetTypes.PetTypeID

This makes it more readable than the nested join AND avoids those pesky RIGHT JOINs.

Jeff
2007-11-02
re: Be Careful When Mixing INNER and OUTER Joins
Hi Eric --

That is not completely the same. Now, in my sample schema, there is a fk constraint that ensures that any PetTypeID in the Pets table must have a matching row in the PetTypes table, so in this case it returns the same results, but logically, it is a different query with different results depending on the table structure. Thus, it cannot be considered another way to rewrite the same thing in all cases, just this specific one.

That is, if we did not have the FK constraint between PetTypes and Pets on PetTypeID, and there was a Pet with a PetTypeID that did not exist in the PetTypes table, your query will still return the Pet, which my examples would not. So, they are not logically equivalent in all cases since they express different logic.

My examples say: "return all people, and any pets that have a *matching* petType in the PetTypes table"

yours says: "return all people, and any pets that have *any* non-null PetType, along with the matching PetType, if any."

This goes along the lines of what I warned about when replacing things with LEFT OUTER JOINS so that things "seem" to work; you need to be sure that you are absolutely expressing the same (or the desired) logic when rewriting a join clause and/or criteria. It often helps to re-state your SQL in plain english or pseudo-code.

Thanks for your feedback!

Chuck Bevitt
2009-09-01
re: Be Careful When Mixing INNER and OUTER Joins
Nice article definitely bookmarked. My only objection is that it starts giving examples without stating the basic principle – unfortunately too common in IT technical writing. The basic principle, mentioned partway through the article, is that SQL Server processes the joins in the order you specify. That and the definition of inner and outer joins completely controls the outcome. This can take awhile to wrap one’s brains around, which is where the examples are invaluable.

Mikael Klint
2010-04-11
re: Be Careful When Mixing INNER and OUTER Joins
I have always been nesting and indenting joins to improve readability.
By nesting this query you get the result you are achieving.
My solution is to make the join statement as the starter and the on-statement as the closer. This approach, has one drawback and that is that the on-statement is seaparated from join-statement i large querys, indenting helps a lot.
THis query return
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets
inner join PetTypes
on Pets.PetTypeID = PetTypes.PetTypeID
on Pets.OwnerID = People.PersonID

PersonName PetName PetType
-------------------- ---------- ----------
Fred Flintstone Dino Dinosaur
Barney Rubble Hoppy Hopparoo
George Jetson NULL NULL

(3 row(s) affected)


Is nesting joins like this a bad idea, why?
Sorry for commenting an old post.

Mikael Klint
2010-04-11
re: Be Careful When Mixing INNER and OUTER Joins
Don´t mind my post I didn´t read the entire post, you have already written about nesting join expressions.

I always write querys with nesting expressions, I find it easier and faster to get a overview about how the query is built up, that if the query is properly indented, I don´t use parenthesis, like your example.
Mantaining isn´t either a problem, in fact I think it is easier to maintain.

A Exampel::
(In this example I have replace spaces/tabs with - since the blogg removes duplicate spaces when posting)

SELECT ...
FROM Table1 AS t1
--INNER JOIN Table2 AS t2
----INNER JOIN Table3 AS t3
------LEFT JOIN Table5 AS t5
--------INNER JOIN Table6 AS t6
--------ON t6.xId = t5.id
------ON t5.xId = t3.Id
----ON t3.xId = t2.Id
----LEFT JOIN Table4 as t4
----ON t4.xId = t2.Id
--ON t2.xId = t1.Id

SELECT ...
FROM Table1 AS t1
--INNER JOIN Table2 AS t2 ON t2.xId = t1.Id
--INNER JOIN Table3 AS t3 ON t3.xId = t2.Id
--LEFT JOIN Table5 AS t5 ON t5.xId = t3.Id
--INNER JOIN Table6 AS t6 ON t6.xId = t5.id
--LEFT JOIN Table4 as t4 ON t4.xId = t2.Id
I think the readability of the first example is faster to get a grip of the hierarchy over the query's tables.
In 10 sec. you see that Table2 is joined to Table1. Table3 and Table4 is joined to Table2. Table5 to Table3 and Table6 to Table5.
Sure it is the doubled amount of lines, the joins to Table4 and Table6 could be put in a single line, instead of two lines (JOIN-row and ON-row).
If I decide to remove the Table3 it is easy to see that Table5 and Table6 have to be removed also, or that they have to be modified, at least Table5.
And most important, it always perform the way I want, since it is what this blogpost is all about. You don´t always think of this scenario, you may when writing the query, but a year later you change a LEFT JOIN to a INNER JOIN and...

R. Medley
2010-08-29
re: Be Careful When Mixing INNER and OUTER Joins
I wanted to simply say, "Well Done."

The internet seems to be plagued by knowledgeable programmers that litterally crap of newbs, negating that they were ignorant at one time too. When you right an article you are seriously contemplating all the parameters, and produce acurate examples. Your experience in seeing the "alternatives" to producing the current data sets results vs. producing the correct results in most if not all scenarios, enables us newbs to grasp what others were trained in. You take the time to address the replies fully and not half ass.

Although most of what I've needed you seem to have written something addressing it, I hope that you'll be encouraged to begin blogging here again with more articles.