I had previously written about the danger of
Criteria on Outer Joins, but recently another situation popped up that occasionally causes confusion with OUTER JOINS that I thought I might address. The issue is when you have multiple tables joined in a single SELECT, and you mix OUTER and INNER JOINS together. The end result doesn't always seem to "work", and it can be tricky to understand exactly why and how to fix it without incurring additional unintended side effects.
Consider the following schema and sample data:
create table People
(
PersonID int identity primary key,
PersonName varchar(20)
)
create table PetTypes
(
PetTypeID int identity primary key,
PetType varchar(10)
)
create table Pets
(
PetID int identity primary key,
PetTypeID int references PetTypes(PetTypeID) not null,
PetName varchar(10),
OwnerID int references People(PersonID) not null
)
insert into People (PersonName)
select 'Fred Flintstone' union all
select 'Barney Rubble' union all
select 'George Jetson'
insert into PetTypes (PetType)
select 'Dinosaur' union all
select 'Hopparoo'
insert into Pets (PetTypeID, PetName, OwnerID)
select 1,'Dino',1 union all
select 2,'Hoppy',2
(note: I actually don't remember Hoppy at all from the
cartoon, but I read about him
here)
Notice that all pets must have an owner, and all pets must have a PetType that indicates what they are.
If we wish to return all People and their pets, if any, we would use an OUTER JOIN like this:
select People.PersonName, Pets.PetName
from People
left outer join Pets on Pets.OwnerID = People.PersonID
PersonName PetName
-------------------- ----------
Fred Flintstone Dino
Barney Rubble Hoppy
George Jetson NULL
(3 row(s) affected)
We use the OUTER JOIN so that we still return George Jetson, even though he has no Pet. Since there is no Pet for him, NULLS are returned for all columns in the Pets table for that row. So far so good, this is exactly what we want, things are working great.
Now, suppose we wish to indicate the PetType as well for each Pet. All Pets must have a PetType since the column is not nullable, so it appears that we can simply add an INNER JOIN between Pets and PetTypes to our previous SELECT to include this information. Let's give it a shot:
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
PersonName PetName PetType
-------------------- ---------- ----------
Fred Flintstone Dino Dinosaur
Barney Rubble Hoppy Hopparoo
(2 row(s) affected)
Where did George Jetson go? We know that we can do an INNER JOIN between Pets and PetTypes since all Pets will always have a PetType match, and we did an OUTER JOIN from People to Pets so ensure that all People are returned regardless of having a pet or not. On paper, then, it might seem that something isn't working correctly since the code seems solid.
Well, we need to look a little more closely. If we go back to our original OUTER JOIN (that included George in the results) and add the PetTypeID column from the Pets table to it, we get:
select People.PersonName, Pets.PetName, Pets.PetTypeID
from People
left outer join Pets on Pets.OwnerID = People.PersonID
PersonName PetName PetTypeID
-------------------- ---------- -----------
Fred Flintstone Dino 1
Barney Rubble Hoppy 2
George Jetson NULL NULL
(3 row(s) affected)
Notice that the PetTypeID column is NULL for George. SQL Server processes the joins in the order you specify, so it first gets the results from the OUTER JOIN between People and Pets, and then uses those results to INNER JOIN to the PetTypes table. The NULL PetTypeID returned for George Jetson doesn't match any rows in the PetTypes table, and the INNER JOIN requires a match, so George is not returned in the results.
So how do we fix this? One approach commonly used, which should really be avoided, is to simply change the INNER JOIN between Pets and PetTypes to a LEFT OUTER JOIN as well:
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
left outer join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
PersonName PetName PetType
-------------------- ---------- ----------
Fred Flintstone Dino Dinosaur
Barney Rubble Hoppy Hopparoo
George Jetson NULL NULL
(3 row(s) affected)
This appears to do the trick. However, it is not the equivalent of our original intentions! We initially wrote an INNER JOIN between Pets and PetTypes, since we do not want to return any Pets that do not have a corresponding PetType. In this case, it is not an issue because PetTypeID cannot be NULL for a Pet, but what if it could be? The results will not be logically the same if we switch that INNER JOIN to an OUTER JOIN.
For example, let's create a new version of the Pets table, that allows for NULL pet types:
drop table Pets
go
create table Pets
(
PetID int identity primary key,
PetTypeID int references PetTypes(PetTypeID),
PetName varchar(10),
OwnerID int references People(PersonID) not null
)
insert into Pets (PetTypeID, PetName, OwnerID)
select 1,'Dino',1 union all
select 2,'Hoppy',2 union all
select null,'Baby Puss',1
Notice now that we've included Fred's cat,
Baby Puss, but we did not assign him a PetType.
Let's now re-run the original OUTER JOIN/INNER JOIN statement that excluded George:
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
PersonName PetName PetType
-------------------- ---------- ----------
Fred Flintstone Dino Dinosaur
Barney Rubble Hoppy Hopparoo
(2 row(s) affected)
The same results are returned; George is excluded for reasons we discovered (and are trying to fix), and Baby Puss is missing because we explicitly asked for an INNER JOIN between Pets and PetTypes. The only issue we should have, then, with the previous SELECT is that George is missing and we'd like to see him -- Baby Puss should be excluded by design. (If that isn't clear, let me know) So, if we switch the second join to an OUTER JOIN in an attempt to fix things to get George returned, we now get:
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
left outer join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
PersonName PetName PetType
-------------------- ---------- ----------
Fred Flintstone Dino Dinosaur
Fred Flintstone Baby Puss NULL
Barney Rubble Hoppy Hopparoo
George Jetson NULL NULL
(4 row(s) affected)
Notice that suddenly Baby Puss appears! That is because we changed our join from Pets to PetTypes from an INNER to an OUTER. So, just changing INNER JOINS to OUTER JOINS to "fix" this problem may have unintended side effects, and your query is no longer logically the same! It may work in most cases, or in all cases depending on your constraints and relations, but it is important to understand that logically you have completely changed what you are asking for.
The best solution is to use a derived table to encapsulate the INNER JOIN between Pets and PetTypes. Then, we simply select FROM the People table and OUTER JOIN to the derived table. This returns the results we are looking for:
select
People.PersonName, Pets.PetName, Pets.PetType
from
People
left outer join
(
select Pets.ownerID, Pets.PetName, PetTypes.PetType
from Pets
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
)
Pets
on
Pets.OwnerID = People.PersonID
PersonName PetName PetType
-------------------- ---------- ----------
Fred Flintstone Dino Dinosaur
Barney Rubble Hoppy Hopparoo
George Jetson NULL NULL
(3 row(s) affected)
That's more like it! Note that we could also use a Common Table Expression if you are using SQL 2005.
An alternative option, which also works correctly, is to nest your join expression like this:
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join
(Pets inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID)
on Pets.OwnerID = People.PersonID
That returns the correct results and is logically accurate, but I find it can be confusing to read and a little more difficult to maintain. In general, I have found that most database programmers tend to avoid nested joins expressions. It's up to you to decide which option you'd prefer to use.
So, be careful when mixing INNER and OUTER joins; it can be tricky to hunt down unintended results and side effects. If you use Derived Tables (or nested joins) the way you'd use parenthesis in a math or boolean equation to express your logical order of precedence, you can make your code cleaner and more readable and you can ensure that you get back the exact results you intended.
see also: