Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

GROUP BY ALL

Here's an obscure piece of SQL you may not be aware of:  The "ALL" option when using a GROUP BY.

Consider the following table:

Create table Sales
(
    SaleID int identity not null primary key,
    CustomerID int,
    ProductID int,
    SaleDate datetime,
    Qty int,
    Amount money
)

insert into Sales (CustomerID, ProductID, SaleDate, Qty, Amount)
select 1,1,'2008-01-01',12,400 union all
select 1,2,'2008-02-25',6,2300 union all
select 1,1,'2008-03-02',23,610 union all
select 2,4,'2008-01-04',1,75 union all
select 2,2,'2008-02-18',52,5200 union all
select 3,2,'2008-03-09',99,2300 union all
select 3,1,'2008-04-19',3,4890 union all
select 3,1,'2008-04-21',74,2840

SaleID      CustomerID  ProductID   SaleDate                Qty         Amount
----------- ----------- ----------- ----------------------- ----------- ---------------------
9           1           1           2008-01-01 00:00:00.000 12          400.00
10          1           2           2008-02-25 00:00:00.000 6           2300.00
11          1           1           2008-03-02 00:00:00.000 23          610.00
12          2           4           2008-01-04 00:00:00.000 1           75.00
13          2           2           2008-02-18 00:00:00.000 52          5200.00
14          3           2           2008-03-09 00:00:00.000 99          2300.00
15          3           1           2008-04-19 00:00:00.000 3           4890.00
16          3           1           2008-04-21 00:00:00.000 74          2840.00

(8 row(s) affected)


Suppose we'd like to see the customers that were sold Product #1 along with the total amount that they spent.

We would basically write a simple SELECT with a GROUP BY like this:

select CustomerID, sum(Amount) as TotalAmount
from Sales
where ProductID = 1
group by CustomerID

And sure enough, we'd get our answer:

CustomerID  TotalAmount
----------- ---------------------
1           1010.00
3           7730.00

(2 row(s) affected)

Now, let's say that we'd like to see all customers that have been sold any products, but we still just want to see the "TotalAmount" for ProductID #1.  For customers that have never ordered ProductID #1, it should output a "TotalAmount" value of $0.   One way to do this is with a CASE expression; instead of filtering so that only ProductID #1 is returned, we can conditionally SUM() the Amount only for orders for ProductID #1.  Like this:

select CustomerID, sum(case when ProductID=1 then Amount else 0 end) as TotalAmount
from Sales
group by CustomerID

CustomerID  TotalAmount
----------- ---------------------
1           1010.00
2           0.00
3           7730.00

(3 row(s) affected)

That gives us the results we want.   Because we are not using a WHERE clause to filter the data, we see an entry for CustomerID #2 in the output. 

However, in situations where you have written the above SQL, you could actually replace the SUM(CASE...) expression by using GROUP BY ALL, instead of just a standard GROUP BY, like this:

select CustomerID, sum(Amount) as TotalAmount
from Sales
where ProductID = 1
group by all CustomerID

CustomerID  TotalAmount
----------- ---------------------
1           1010.00
2           NULL
3           7730.00
Warning: Null value is eliminated by an aggregate or other SET operation.

(3 row(s) affected)

Notice that now all Customers are now returned, and a NULL is shown as the TotalAmount for Customer #2, who has no orders for ProductID #1 ...  Even though though the WHERE clause seems to indicate that we should not be seeing customer #2 in the results!

The ALL option basically says "ignore the WHERE clause when doing the GROUPING, but still apply it for any aggregate functions".   So, in this case, the WHERE clause is not considered when generating the population of CustomerID values, but it is applied when calculating the SUM.  This is very much like our first solution, where we removed the WHERE clause completely, and used a SUM(CASE...) expression to conditionally calculate the aggregate. 

Values that are excluded from the aggregation according to the WHERE clause have NULL values returned, as you can see in the result.  A simple ISNULL() or COALESCE() will allow us to return 0 instead of NULL:

select CustomerID, isnull(sum(Amount),0) as TotalAmount
from Sales
where ProductID = 1
group by all CustomerID

CustomerID  TotalAmount
----------- ---------------------
1           1010.00
2           0.00
3           7730.00
Warning: Null value is eliminated by an aggregate or other SET operation.

(3 row(s) affected)

Notice that the warning about NULL being aggregated still displays, since that is the standard behavior in SQL Server when you calculate an aggregate on a NULL value.  You can turn these warnings off if you like for the during of the batch by issuing a set ANSI_WARNINGS off command before your SELECT.

GROUP BY ALL is kind of obscure and neat to know, but not really useful in most situations since there are usually easier or better ways to get this result.  For one thing, this won't work if we want all Customers to be displayed, since a customer must have at least one order to show up in the result.  If we want to see all customers, even those that have never ordered, we would need to do a LEFT OUTER JOIN from the Customers table to our Orders aggregate SELECT:

create table Customers (CustomerID int primary key)
insert into Customers
select 1 union all
select 2 union all
select 3 union all
select 4

-- Notice that we have 4 customers, but our Sales data has sales for only 3.

select c.customerID, isnull(s.TotalAmount,0) as TotalAmount
from Customers c
left outer join
    (select customerID, sum(Amount) as TotalAmount
     from Sales
     where ProductID = 1
    group by customerID) s on c.customerID = s.customerID
   
customerID  TotalAmount
----------- ---------------------
1           1010.00
2           0.00
3           7730.00
4           0.00

(4 row(s) affected)

That is typically the standard way to return data for an entire population, regardless of existing transactions.  GROUP BY ALL gets us close, but if a new customer has never made an Order, they will never show up in the results.   Of course, depending on your needs, that may be what you want.

Another limitation is we can not use GROUP BY ALL if we want to return a grand total for all orders, along with the total just for ProductID #1.  For example, using the SUM(CASE...) expression along with a regular SUM(), we can do this:

select CustomerID, sum(case when ProductID=1 then Amount else 0 end) as Product1Amount,
    sum(Amount) as TotalAmount
from Sales
group by CustomerID

CustomerID  Product1Amount        TotalAmount
----------- --------------------- ---------------------
1           1010.00               3310.00
2           0.00                  5275.00
3           7730.00               10030.00

(3 row(s) affected)

That lets us calculate two different totals all in one pass through the table.  However, we cannot translate that using GROUP BY ALL, because while we will be able to return the Product1Amount, there would be no easy way to also get the TotalAmount for all products without an additional join or sub-query.

. . .

So, that's the story with GROUP BY ALL. It is interesting, and not widely well-known, and may even make for a good interview question if you really want to see how much SQL a candidate knows.  But for practical purposes, it is pretty rarely used and there are generally better ways to get the same results more easily or more efficiently.

Anyone have a good situation or an example of where GROUP BY ALL really worked well for you?  Be sure to share your experiences in the comments.

Legacy Comments


Hugo Kornelis
2008-05-06
re: GROUP BY ALL
Just as an FYI - GROUP BY ALL is marked as deprecated in Books Online since SQL Server 2005. It is, as far as I know, still in SQL Server 2008; but I doubt it will remain in much longer///

Jagadish
2008-05-20
re: GROUP BY ALL
It was very good, its really useful for us,


.....Thanks,
Jagadish

Fred
2008-08-05
re: GROUP BY ALL
As for its deprecation: MS SQL 2005 Books online says "group by all" is already unsupported on remote tables in 2005.

I've never used it, just learned its existence, glad to know that "it's not really useful" and how to replace it. Indeed, left join from a base table looks a neater way to go.

Himani
2008-12-01
re: GROUP BY ALL
Congrats, you are blessed with a son. i want to mention that it was good to go through ur blog. I have some theoritical knowlede of mssql but i want to enhance it and make it practical. I have time period of two to three months. Can u help me by giving some tips or guidance.

Amit Mithiya
2009-06-24
re: GROUP BY ALL
Congrats, you are blessed with a son. i want to mention that it was good to go through ur blog. I have some theoritical knowlede of mssql but i want to enhance it and make it practical. I have time period of two to three months. Can u help me by giving some tips or guidance