I've written a two part article on using SQL GROUP BY clauses over at
SQLTeam.com. It's always a common topic of discussion and confusion amongst beginner and intermediate SQL programmers alike, so I thought I'd write a fairly long and hopefully comprehensive piece that takes a common summary report request and works towards the solution step-by-step.
- Part I: Intro to GROUP BY; Duplicates caused by JOINS; Identifying "Virtual Primary Keys"; Using COUNT(Distinct)
- Part II: Examining SUM(Distinct); GROUPING before JOINING; Using Derived Tables
I tried to focus on what happens when you join two tables and then try to group and aggregate the results, and where and how to do this to avoid aggregating duplicate values. As always, I attempt to demonstrate breaking down the larger problem into smaller, simpler parts and then putting those pieces together to produce the final result.
Perhaps most importantly: if you've ever used
SUM(Distinct) in your code in an attempt to handle duplicates, be sure to read Part II. As
Inigo Montoya would say:
I do not think it means what you think it means!