GROUP BY
Grouping and summarizing data in SQL, using aggregate functions, etc.
Nothing earth-shattering here, I was just helping out a colleague with this so I thought I'd post up the example I gave him.
-- sample table:
create table People
(
Person varchar(1) primary key,
City varchar(10),
Age int
)
go
-- with some sample data:
insert into People
select 'A','Boston',23 union all -- odd #
select 'B','Boston',43 union all
select 'C','Boston',29 union all
select 'D','Chicago',15 union all -- single #
select 'E','NY',12 union all -- even #
select 'F','NY',55 union all
select 'G','NY',57 union all
select 'H','NY',61
go
-- here's our query, showing median age per city:
select city,
AVG(age) as MedianAge
from
(
select City, Person, Age,
ROW_NUMBER() over (partition by City order by Age...
Here's an obscure piece of SQL you may not be aware of: The "ALL" option when using a GROUP BY.
Consider the following table:
Create table Sales
(
SaleID int identity not null primary key,
CustomerID int,
ProductID int,
SaleDate datetime,
Qty int,
Amount money
)
insert into Sales (CustomerID, ProductID, SaleDate, Qty, Amount)
select 1,1,'2008-01-01',12,400 union all
select 1,2,'2008-02-25',6,2300 union all
select 1,1,'2008-03-02',23,610 union all
select 2,4,'2008-01-04',1,75 union all
select 2,2,'2008-02-18',52,5200 union all
select 3,2,'2008-03-09',99,2300 union all
select 3,1,'2008-04-19',3,4890 union all
select 3,1,'2008-04-21',74,2840
SaleID CustomerID ProductID SaleDate Qty Amount
----------- ----------- ----------- ----------------------- ----------- ---------------------
9 1 1 2008-01-01 00:00:00.000 12 400.00
10 1 2 2008-02-25 00:00:00.000 6 2300.00
11 1 1 2008-03-02 00:00:00.000 23 610.00
12 2 ...
As with any programming language, it is important in SQL to keep your code short, clear and concise. Here are two quick tips that I find are very helpful in obtaining this goal.
Let's take a look at another one of those stupid, arbitrary SQL Server error messages that Bill Gates clearly only created because Micro$oft is evil and incompetent and they want to annoy us (and probably kill baby squirrels, too):
Msg 145, Level 15, State 1, Line 4
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
This message pops up when you ask for DISTINCT rows for one set of columns, but you'd like to have the results ordered by one or more columns not specified in your distinct set. For some reason, SQL Server will not allow...
In SQL, the general rule of thumb is that the number of rows returned from a SELECT will be zero if your criteria did not match any data. However, there is an important exception to this rule: it does not apply when asking for aggregate calculations such as SUM(), MIN() or MAX(), without any grouping. read more...
As David Letterman would say, wake the kids, call the neighbors, it's time for The Mailbag! Just some quickies today.
Christopher writes:
Greetings Jeff,
First and foremost, great job with all of the blogs. I have a questions
that I cannot seem to get a straight answer for. I am working with SQL
Server Reporting Services (SSRS) and have the need to create VB
functions to customize the reports generated. For example, a setter/getter to
display information that would not be readily available from the
query. SSRS allows this type of custom Visual Basic code to reside in the
report itself, but since most of my code is across...
When you need to summarize transactional data by Month, there are several ways to do it, some better than others. What to ultimately choose depends on your needs, but remember: Keep it short and simple in T-SQL, and always do all of your formatting at your presentation layer where it belongs. read more...
I've written a two part article on using SQL GROUP BY clauses over at SQLTeam.com. read more...
Column 'xyz' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Arggh!! There it is, yet again .. that annoying error message. Why is SQL so picky about this? What's the deal!? read more...
Sometimes, it appears that a necessary solution to common SQL problems is to join a table to itself. While self-joins do indeed have their place, and can be very powerful and useful, often times there is a much easier and more efficient way to get the results you need when querying a single table.
read more...
Did you know that a new feature in SQL Server 2005 allows you to specify an OVER partition for aggregate functions in your SELECT statements?
read more...
As many of you know, I strongly recommend that you avoid using RIGHT OUTER JOINS, since they make your SQL code less readable and are easily rewritten as LEFT OUTER JOINs. In addition, I have yet to find a situation where a FULL OUTER JOIN makes sense or is necessary -- I have found that in just about every case other techniques work better.
read more...
I thought I'd take a few minutes to discuss something we see quite often in the programming world, using a T-SQL example of a stored procedure that accepts a list of optional parameters allowing you to determine some basic filters on the results.
read more...
A common difficulty beginning SQL programmers encounter is joining two or more transactional tables all in one SELECT statement. Missing data, duplicates, time-out errors, and other unexpected results often arise from trying to directly write JOINS between two transaction tables.
read more...
One aspect of the versatile SELECT statement that seems to confuse many people is the GROUP BY clause. It is very important to group your rows in the proper place.
read more...
When you have two tables (or resultsets from SELECT statements) that you wish to compare, and you want to see any changes in ANY columns, as well as to see which rows exist in 1 table but not the other (in either direction) I have found that the UNION operator works quite well.
read more...