A common difficulty beginning SQL programmers encounter is joining two or more transactional tables all in one SELECT statement. Missing data, duplicates, time-out errors, and other unexpected results often arise from trying to directly write JOINS between two transaction tables.
Consider a simple schema with the following tables (primary key columns in bold):
Budget: Year, Month, Account, Dept, Amount
Commitments: CommitmentID (Identity), Year, Month, Account, Dept, Amount
Actuals: ActualID (Identity), Year, Month, Account, Dept, Amount
That is basically what I've inherited in the latest application I've been tasked to clean up. Data comes from several sources, so I suppose that explains why things are in different tables. The basic idea is that your budget is set by Account/Dept/Year/Month, and you need to write reports that let you see Budget, Actual and Commitments amounts side by side with a few calculations as well.
After looking at the existing hundreds of lines of temp tables, cursors, update statements and other strange ways people have of writing simple SELECTs, I am reminded that it is not always obvious how to join several transactional tables into 1 SELECT statement.
So, how do you get Budgets, Commitments and Actuals all together?
The answer is: not by JOINing them together ... but by UNIONing them together. By layering the tables on top of one another, and spreading out the Amounts from each table into its own column, and then grouping it all together and SUMing up each amount column, we will have the results we need. Essentially, we are using the UNION ALL operator to create 1 big transaction table with all of our data, and then simply aggregating the results. (Remember that we use UNION ALL and not just UNION since it is more efficient).
So, if we wanted to see the numbers for a particular @Year, with totals for each Month, Account and Department, we could write:
select
Year, Month, Account, Dept,
sum(Budget) as Budget,
sum(Commitment) as Commitment,
sum(Actual) as Actual
from
(
select
Year, Month, Account, Dept, Amount as Budget, 0 as Commitment, 0 as Actual
from
Budget
union all
select
Year, Month, Account, Dept, 0 as Budget, Amount as Commitment, 0 as actual
from
Commitments
union all
select
Year, Month, Account, Dept, 0 as Budget, 0 as Commitment, Amount as actual
from
Actuals
)
tmp
where
Year = @Year
group by
Year, Month, Account, Dept
And .. that's it. We're done!
We don't need to worry about months where there is a budget but no actual, or an actual but no budget, which becomes a common problem with JOINS. We don't need to use an ugly FULL OUTER JOIN and wrap up every single column in COALESCE() functions. We don't need long derived table expressions. And we don't need to worry about multiple Commitments or Actuals all in the same month causing duplicates and affecting our Budgets. It's all taken care of for us.
Of course, you would wrap this SQL statement in a derived table, join it to the Accounts and Department tables, and calculate a few more columns (variance, percentages, etc) to complete our report. We also could remove the Month grouping from our SELECT and calculate only one value per Year/Account/Dept, with Year-to-Date and Month-To-Date columns very easily.
So, don't forget about UNION ALL ... it's very powerful and really a great way to easily summarize and compare two or more unrelated transactional tables.
Well, I better get back to the cursor busting ....
see also: