Welcome!
The reason you were directed here is because you need assistance, and I am here to help. I am not, however, here to provide you with any answers! You see, it looks like the assistance you need is not finding an
answer; it is rather that you need assistance finding a
question.
As you know, there are all kinds of questions. Questions that test memory recall. Questions that test logic skills. Brain-teasers and mathematical questions and so on. But there is one requirement that all good questions must have in common before they can be answered:
A proper question MUST provide ALL of the information necessary in order for an answer to be given.
In other words, if you omit important information from a question, it doesn't matter how simple or easy that question is: It suddenly becomes very difficult, or even impossible, to answer.
For example, consider the following question:
"Am I wearing a hat?"
Seems pretty easy, right? No logic, no memorization, no trivia, no knowledge of any specific topic is required.
So ... what's the answer? Take a few minutes, think about it, write it down on a piece of scrap paper. I can wait, take your time ...
What's that, you say? You
can't answer that simple question!? Why not? I stated it very clearly, it requires a simple YES or NO response, there's nothing tricky there. So, why would anyone have any trouble giving an answer to something so basic and simple?
The reason, of course, is because you can't see me. You have no way of knowing what I am wearing because I did not provide you enough information! As simple as it is, it cannot be answered; therefore, it is
not a proper question!
Suppose, instead, I provided a picture of myself and asked
"In this picture, am I wearing a hat?" And, in the picture, my head is clearly visible and the fact that I am wearing a Boston Red Sox cap is very clear. Would you be able to answer the question in that scenario? Of course! Suddenly, what was an impossible question to answer became very simple!
How did that happen?
It happened, of course, because I provided you enough
information to answer the question! And that is the often problem with many of the questions we see day to day in forums asking for help. You cannot expect an
answer unless you provide a proper
question with all the necessary
information. The majority of the effort by those helping others in these forums is not spent answering questions, it is spent trying to figure out what the heck the question actually is! And that is the problem; people don't seem to realize that they can't just randomly cut and paste code or ask vague questions without any context and expect to receive help!
Yet, requesting more information and details doesn't always go over so well ... Those looking for help seem to often have trouble understanding why the helpers need more info, why are they are so "anal" and "demanding" about minor things like database schemas, or sample data, or code samples. "How is that important?" they wonder. "Just answer the question and provide me with some help, please! An expert would know the answer!"
My goal today is to hopefully help you to understand why information and context is so important for even simple, basic, questions, and how providing that information suddenly transforms a poor question into a very good one that can be quickly and accurately answered.
Let's try another example:
"How do I get from work to Fenway Park?"
So, what do you think? Is that a good question that provides all of the information? It seems simple, right? It is just asking for basic directions. Yet, where is "work"? Is it asking for driving directions, or walking directions, or maybe which subway lines to take? Who knows!!?
As stated, this question simply cannot be answered! Yet, if I just thought about it a little and made sure to provide
all of the necessary information, this "impossible" question with no answer suddenly becomes a very simple one:
"How I do get from 125 High Street, Boston, MA to Fenway Park via the subway?"
See the difference? Instead of just assuming that everyone knows where I work and what mode of transportation I am looking for, if I make sure to simply
tell them, there is no uncertainty, no confusion, no guesswork, and the question can be answered. This isn't rocket science, right? Yet, these common-sense basics seem to elude many, many, people!
Would you drop your car off at a mechanic with a note on it that says:
"Car doesn't work. Plz fix. It is urgent! Thx!!"
I sure hope not. You'd explain what's wrong, right? It has trouble starting, it has a flat tire, there's smoke coming from beneath the hood, the steering wheel fell off, and so on. It's basic common sense that you would do everything you can to be sure that the mechanic has the information he needs to fix your car correctly and promptly, right? Shouldn't that same logic also apply when asking for help in forums?
Finally, let's try a SQL Server question:
"What is wrong with teh codez? it does not work! Plz Help!
select SaleID, Customer, Qty, Price
from SalesNumbesr
Thnx!"
Take a look at that question. It is a very simple SQL statement, right? There's nothing there that a beginner could not understand. Can you "spot" the problem and fix it?
Hmmm ... maybe you can, maybe you can't. You can't really be sure, can you? At this point, we can all try to guess what the problem is. What does "it does not work" even mean? Is "SalesNumbesr" a typo? Should it be "SalesNumbers" ? Is it returning too much data? not enough data? Incorrect data? Is it generating an error? And so on.
We could spend all day trying to guess what the question is and provide answers to those guesses, but if the guesses are wrong, the answers won't be so helpful, will they? On the previous question, what if you guessed that I work in Cleveland and provided directions for me to Fenway Park from there? Would that be helpful to me? Probably not, right? Most likely, it just wasted everyone's time.
So, getting back to the code .... what is wrong with it? Well, in SQL terms, the answer is:
NULL! It does not exist. Until we are provided with more information, the question cannot be answered. Thus, it is not a question at all, just an incomplete fragment. As simple as the question looks, as basic as the T-SQL is, this "question" will stump even the greatest "experts" out there because an answer to this question simply does not exist.
If more information is provided, like this:
"Hi -- I currently have the code below:
select SaleID, Customer, Qty, Price
from SalesNumbesr
I would also like to return the total Amount for each Sale, which is the Qty multiplied by the Price. However, I am not sure how to add this to my current code. Can anyone please help?"
Suddenly, the question now is very clear and the answer is very simple! They just want to know how to add an expression to the result set. Just by providing a little more information, and not assuming that everyone knows what is happening outside of the context of what was written, something that was impossible to answer has become very easy.
I sure hope this is making sense.
In fact, taking this whole article to its logical conclusion, I think we can safely say:
"The more accurate and detailed information a question provides, the more accurate and detailed the answers will be."
In other words, a vague, incomplete question can only get, at best, vague, incomplete answers. But a question that spells out the entire situation very clearly will get, at best, a very clear and specific answer that works in that situation.
So, please, think of this when you ask questions in a forum. Consider the fact that no one knows your specific environment, or code, or application, or database except for YOU. And, no one can help you unless you are providing enough information for them to do so.
Just like the mechanic. Or someone giving driving directions. Or a doctor when you are sick. You provide them with the necessary information so they can help you, right? Consider doing the same to those providing you with (free!) programming advice.
. . .
(Feel free to provide this link to those who seen to have trouble understanding that you cannot read their mind when assisting them with programming help on forums.)
Here's an obscure piece of SQL you may not be aware of: The "ALL" option when using a GROUP BY.
Consider the following table:
Create table Sales
(
SaleID int identity not null primary key,
CustomerID int,
ProductID int,
SaleDate datetime,
Qty int,
Amount money
)
insert into Sales (CustomerID, ProductID, SaleDate, Qty, Amount)
select 1,1,'2008-01-01',12,400 union all
select 1,2,'2008-02-25',6,2300 union all
select 1,1,'2008-03-02',23,610 union all
select 2,4,'2008-01-04',1,75 union all
select 2,2,'2008-02-18',52,5200 union all
select 3,2,'2008-03-09',99,2300 union all
select 3,1,'2008-04-19',3,4890 union all
select 3,1,'2008-04-21',74,2840
SaleID CustomerID ProductID SaleDate Qty Amount
----------- ----------- ----------- ----------------------- ----------- ---------------------
9 1 1 2008-01-01 00:00:00.000 12 400.00
10 1 2 2008-02-25 00:00:00.000 6 2300.00
11 1 1 2008-03-02 00:00:00.000 23 610.00
12 2 4 2008-01-04 00:00:00.000 1 75.00
13 2 2 2008-02-18 00:00:00.000 52 5200.00
14 3 2 2008-03-09 00:00:00.000 99 2300.00
15 3 1 2008-04-19 00:00:00.000 3 4890.00
16 3 1 2008-04-21 00:00:00.000 74 2840.00
(8 row(s) affected)
Suppose we'd like to see the customers that were sold Product #1 along with the total amount that they spent.
We would basically write a simple SELECT with a GROUP BY like this:
select CustomerID, sum(Amount) as TotalAmount
from Sales
where ProductID = 1
group by CustomerID
And sure enough, we'd get our answer:
CustomerID TotalAmount
----------- ---------------------
1 1010.00
3 7730.00
(2 row(s) affected)
Now, let's say that we'd like to see
all customers that have been sold
any products, but we still just want to see the "TotalAmount" for ProductID #1. For customers that have never ordered ProductID #1, it should output a "TotalAmount" value of $0. One way to do this is with a CASE expression; instead of filtering so that only ProductID #1 is returned, we can conditionally SUM() the Amount only for orders for ProductID #1. Like this:
select CustomerID, sum(case when ProductID=1 then Amount else 0 end) as TotalAmount
from Sales
group by CustomerID
CustomerID TotalAmount
----------- ---------------------
1 1010.00
2 0.00
3 7730.00
(3 row(s) affected)
That gives us the results we want. Because we are not using a WHERE clause to filter the data, we see an entry for CustomerID #2 in the output.
However, in situations where you have written the above SQL, you could actually replace the SUM(CASE...) expression by using
GROUP BY ALL, instead of just a standard GROUP BY, like this:
select CustomerID, sum(Amount) as TotalAmount
from Sales
where ProductID = 1
group by all CustomerID
CustomerID TotalAmount
----------- ---------------------
1 1010.00
2 NULL
3 7730.00
Warning: Null value is eliminated by an aggregate or other SET operation.
(3 row(s) affected)
Notice that now
all Customers are now returned, and a NULL is shown as the TotalAmount for Customer #2, who has no orders for ProductID #1 ... Even though though the WHERE clause seems to indicate that we should
not be seeing customer #2 in the results!
The ALL option basically says "ignore the WHERE clause when doing the GROUPING, but still apply it for any aggregate functions". So, in this case, the WHERE clause is not considered when generating the population of CustomerID values, but it is applied when calculating the SUM. This is very much like our first solution, where we removed the WHERE clause completely, and used a SUM(CASE...) expression to conditionally calculate the aggregate.
Values that are excluded from the aggregation according to the WHERE clause have NULL values returned, as you can see in the result. A simple ISNULL() or COALESCE() will allow us to return 0 instead of NULL:
select CustomerID, isnull(sum(Amount),0) as TotalAmount
from Sales
where ProductID = 1
group by all CustomerID
CustomerID TotalAmount
----------- ---------------------
1 1010.00
2 0.00
3 7730.00
Warning: Null value is eliminated by an aggregate or other SET operation.
(3 row(s) affected)
Notice that the warning about NULL being aggregated still displays, since that is the standard behavior in SQL Server when you calculate an aggregate on a NULL value. You can turn these warnings off if you like for the during of the batch by issuing a
set ANSI_WARNINGS off command before your SELECT.
GROUP BY ALL is kind of obscure and neat to know, but not really useful in most situations since there are usually easier or better ways to get this result. For one thing, this won't work if we want all Customers to be displayed, since a customer
must have at least one order to show up in the result. If we want to see all customers, even those that have never ordered, we would need to do a LEFT OUTER JOIN from the Customers table to our Orders aggregate SELECT:
create table Customers (CustomerID int primary key)
insert into Customers
select 1 union all
select 2 union all
select 3 union all
select 4
-- Notice that we have 4 customers, but our Sales data has sales for only 3.
select c.customerID, isnull(s.TotalAmount,0) as TotalAmount
from Customers c
left outer join
(select customerID, sum(Amount) as TotalAmount
from Sales
where ProductID = 1
group by customerID) s on c.customerID = s.customerID
customerID TotalAmount
----------- ---------------------
1 1010.00
2 0.00
3 7730.00
4 0.00
(4 row(s) affected)
That is typically the standard way to return data for an entire population, regardless of existing transactions. GROUP BY ALL gets us close, but if a new customer has never made an Order, they will never show up in the results. Of course, depending on your needs, that may be what you want.
Another limitation is we can not use GROUP BY ALL if we want to return a grand total for all orders, along with the total just for ProductID #1. For example, using the SUM(CASE...) expression along with a regular SUM(), we can do this:
select CustomerID, sum(case when ProductID=1 then Amount else 0 end) as Product1Amount,
sum(Amount) as TotalAmount
from Sales
group by CustomerID
CustomerID Product1Amount TotalAmount
----------- --------------------- ---------------------
1 1010.00 3310.00
2 0.00 5275.00
3 7730.00 10030.00
(3 row(s) affected)
That lets us calculate two different totals all in one pass through the table. However, we cannot translate that using GROUP BY ALL, because while we will be able to return the Product1Amount, there would be no easy way to also get the TotalAmount for all products without an additional join or sub-query.
. . .
So, that's the story with GROUP BY ALL. It is interesting, and not widely well-known, and may even make for a good interview question if you really want to see how much SQL a candidate knows. But for practical purposes, it is pretty rarely used and there are generally better ways to get the same results more easily or more efficiently.
Anyone have a good situation or an example of where GROUP BY ALL really worked well for you? Be sure to share your experiences in the comments.
Everyone seems to want to
"pivot" or "cross tab" data, but knowing how to do the opposite is equally important. In fact, I would argue that the skill of "unpivoting" data is more useful and more important and more relevant to a SQL programmer, since pivoting results in denormalized data, while unpivoting can transform non-normalized data into a normalized result set. We all know that there's lots of bad databases designs out there, so this can be a handy technique to know.
Of course, even a well designed, fully normalized database can still benefit from "unpivoting" from time to time, so let's take a look at some common situations and some of the options we have to handle this at our disposal. We will focus on some traditional SQL techniques to do this, and then take a close look at the
UNPIVOT operator that was introduced with SQL Server 2005.
Example #1: A Bad database design
Let's start with a commonly bad table design, in which someone has decided to relate a client to multiple contacts by designing their client table like this:
create table Clients
(
clientID int primary key,
clientName varchar(100),
contact1 int,
contact2 int,
contact3 int,
contact4 int
)
insert into Clients
select 1,'ABC Corp',1,34,2,null union all
select 2,'DEF Foundation',6,2,8,9 union all
select 3,'GHI Inc.',5,9,null,null union all
select 4,'XYZ Industries',24,null,6,null
clientID clientName contact1 contact2 contact3 contact4
----------- -------------------- ----------- ----------- ----------- -----------
1 ABC Corp 1 34 2 NULL
2 DEF Foundation 6 2 8 9
3 GHI Inc. 5 9 NULL NULL
4 XYZ Industries 24 NULL 6 NULL
(4 row(s) affected)
(Note: For brevity, I am not including the contact table here, nor the foreign key constraints. Of course, with this table design, it would probably be pretty unlikely to find such constraints in the database anyway)
With this design, it is not very easy or efficient to get a count of all contacts for each client, or to find out which contacts are related to which clients. One thing we can do, however is to "unpivot" this table in a query that returns 1 row per ClientID/ContactID combination. With that result set, we can easily now reference the table as if it were normalized and we can get the information we need.
One way to do is to use UNION ALL to return each row in the clients table 4 times, and each time return a different contactID column:
select clientID, contact1 as ContactID
from clients
where contact1 is not null
union all
select clientID, contact2 as ContactID
from clients
where contact2 is not null
union all
select clientID, contact3 as ContactID
from clients
where contact3 is not null
union all
select clientID, contact4 as ContactID
from clients
where contact4 is not null
clientID ContactID
----------- -----------
1 1
2 6
3 5
4 24
1 34
2 2
3 9
1 2
2 8
4 6
2 9
(11 row(s) affected)
Another option is to CROSS JOIN the Clients table with a table or resultset that returns 4 rows, which also effectively returns each row in the clients table 4 times. For each of the 4 values in the table we are cross joining, we grab a different contact column:
select *
from
(
select c.clientID,
case n.n when 1 then c.contact1
when 2 then c.contact2
when 3 then c.contact3
when 4 then c.contact4 end as ContactID
from
clients c
cross join
(select 1 as n union all select 2 union all select 3 union all select 4) n
)
x
where
x.ContactID is not null
clientID ContactID
----------- -----------
1 1
1 34
1 2
2 6
2 2
2 8
2 9
3 5
3 9
4 24
4 6
(11 row(s) affected)
(Note that you can use a permanent table of Numbers in your database instead of generating it on the fly with a UNION, as shown)
Finally, however, there is an even eaiser way to handle this: the
UNPIVOT operator, new with SQL 2005.
UNPIVOT works very efficiently and really allows you to handle this exact situation quite easily:
select clientID, Contact.ContactID
from clients
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
clientID ContactID
----------- -----------
1 1
1 34
1 2
2 6
2 2
2 8
2 9
3 5
3 9
4 24
4 6
(11 row(s) affected)
Much shorter to write, and more efficient to execute as well.
Taking a Closer Look at UNPIVOT
The UNPIVOT operator is tricky to get a feel for, however, so let's take a look at it.
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
First, the "As Contact" at the end is just labeling the entire unpivot result set with an alias, just as you must alias a derived table. Each column returned by the pivot operator can be referenced by the alias if necessary.
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
The "ContactID for" part says that we want to return a column called "ContactID" for each unpivoted row. The IN() list is the columns that we are unpivoting; the values in the 4 columns listed here will be assigned to the ContactID column in the result. So, the first time a particular row is unpivoted, the value of the 'contact1' column is assigned to ContactID, the next time it is the 'contact2' column, then 'contact3', and then finally 'contact4'. Then, the next row is processed and it all begins again.
Thus, because we are unpivoting 4 values, we know that the result of the unpivot will have 4 times as many rows as the source data.
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
UNPIVOT returns an additional column as well, which contains the name of the column that was used to produce each unpivoted row. Here, we have specified that to be called ContactNumber. Note that we actually did not return ContactNumber in our example, be we can easily add that in so you can see how it works:
select clientID, Contact.ContactNumber, Contact.ContactID
from clients
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
clientID ContactNumber ContactID
----------- ----------------------- -----------
1 contact1 1
1 contact2 34
1 contact3 2
2 contact1 6
2 contact2 2
2 contact3 8
2 contact4 9
3 contact1 5
3 contact2 9
4 contact1 24
4 contact3 6
(11 row(s) affected)
So, you can see that the code to write is very short, but a little difficult to grasp at first. In the end, though, we are able to take a bad table design and easily "fix it", at least temporarily, so that we can query it using simple and standard SQL statements to get what we need.
Example #2: Normalizing a Transaction Table
Here's another common example:
create table Transactions
(
TranDate datetime,
Account varchar(10),
BudgetAmount money,
ActualAmount money,
ProjectionAmount money,
primary key (TranDate, Account)
)
go
insert into Transactions
select '2008-01-01','0001',354,65,58 union all
select '2008-01-02','0001',14,65,34 union all
select '2008-01-03','0001',0,65,622 union all
select '2008-01-04','0001',9,32,84
go
TranDate Account BudgetAmount ActualAmount ProjectionAmount
----------------------- ---------- --------------------- --------------------- ---------------------
2008-01-01 00:00:00.000 0001 354.00 65.00 58.00
2008-01-02 00:00:00.000 0001 14.00 65.00 34.00
2008-01-03 00:00:00.000 0001 0.00 65.00 622.00
2008-01-04 00:00:00.000 0001 9.00 32.00 84.00
(4 row(s) affected)
Notice that we have different columns for Budget, Actual and Projection, which is not really a great database design. Much better would be to break this data out so that we have a single 'Amount' column and a 'TransactionType' column that specifies the type of each transaction. We can transform our Transactions tables into this format using UNION ALL:
select TranDate, Account, 'BudgetAmount' as Type, BudgetAmount as Amount from transactions
union all
select TranDate, Account, 'ActualAmount' as Type, ActualAmount as Amount from transactions
union all
select TranDate, Account, 'ProjectionAmount' as Type, ProjectionAmount as Amount from transactions
TranDate Account Type Amount
----------------------- ---------- ---------------- ---------------------
2008-01-01 00:00:00.000 0001 BudgetAmount 354.00
2008-01-02 00:00:00.000 0001 BudgetAmount 14.00
2008-01-03 00:00:00.000 0001 BudgetAmount 0.00
2008-01-04 00:00:00.000 0001 BudgetAmount 9.00
2008-01-01 00:00:00.000 0001 ActualAmount 65.00
2008-01-02 00:00:00.000 0001 ActualAmount 65.00
2008-01-03 00:00:00.000 0001 ActualAmount 65.00
2008-01-04 00:00:00.000 0001 ActualAmount 32.00
2008-01-01 00:00:00.000 0001 ProjectionAmount 58.00
2008-01-02 00:00:00.000 0001 ProjectionAmount 34.00
2008-01-03 00:00:00.000 0001 ProjectionAmount 622.00
2008-01-04 00:00:00.000 0001 ProjectionAmount 84.00
(12 row(s) affected)
Or, we can use the
UNPIVOT operator to do the same much easier:
select TranDate, Account, Type, Amount
from Transactions
unpivot (Amount for Type in (BudgetAmount, ActualAmount, ProjectionAmount)) as Amount
TranDate Account Type Amount
----------------------- --------- ------------------------ ---------------------
2008-01-01 00:00:00.000 0001 BudgetAmount 354.00
2008-01-01 00:00:00.000 0001 ActualAmount 65.00
2008-01-01 00:00:00.000 0001 ProjectionAmount 58.00
2008-01-02 00:00:00.000 0001 BudgetAmount 14.00
2008-01-02 00:00:00.000 0001 ActualAmount 65.00
2008-01-02 00:00:00.000 0001 ProjectionAmount 34.00
2008-01-03 00:00:00.000 0001 BudgetAmount 0.00
2008-01-03 00:00:00.000 0001 ActualAmount 65.00
2008-01-03 00:00:00.000 0001 ProjectionAmount 622.00
2008-01-04 00:00:00.000 0001 BudgetAmount 9.00
2008-01-04 00:00:00.000 0001 ActualAmount 32.00
2008-01-04 00:00:00.000 0001 ProjectionAmount 84.00
(12 row(s) affected)
Example #3: "Unsummarizing" Data
For those who work with accounting systems, this example may be familiar to you. Many times, Accounting systems have "summary" tables that roll up transactional data into a structure like this:
create table AccountBalances
(
CompanyID int,
AccountID int,
TransactionTypeID int,
Year int,
Period1 money,
Period2 money,
Period3 money,
Period4 money,
Period5 money,
Period6 money,
Period7 money,
Period8 money,
Period9 money,
Period10 money,
Period11 money,
Period12 money
)
insert into AccountBalances
select 1,1,1,2008,200,300,400,500,400,0,0,0,0,0,0,0 union all
select 1,2,1,2008,100,100,100,100,100,100,100,0,0,0,0,0 union all
select 1,3,1,2008,150,0,50,10,10,200,400,45,0,0,0,0
(As before, let's not worry about those foreign key constraints)
These tables are often calculated when transactions are posted, or periods are closed. Typically, many reports pull from these tables because it is much more efficient than summarizing thousands or millions of transactions, and the data is already "cross-tabbed" the way most reporting tools would like to display it.
We can take this summarized data and "unpivot" it so we can still access the summarized data, but now it will be in a normalized structure. All it takes is a simple
UNPIVOT like this:
select CompanyID, AccountID, TransactionTypeID, Year, substring(Period,7,2) as PeriodNo, Amount
from AccountBalances
unpivot (Amount for Period in (Period1,Period2,Period3,Period4,Period5,Period6,Period7,Period8,Period9,Period10,Period11,Period12)) as Amount
CompanyID AccountID TransactionTypeID Year PeriodNo Amount
----------- ----------- ----------------- ----------- -------- ---------------------
1 1 1 2008 1 200.00
1 1 1 2008 2 300.00
1 1 1 2008 3 400.00
1 1 1 2008 4 500.00
1 1 1 2008 5 400.00
1 1 1 2008 6 0.00
1 1 1 2008 7 0.00
1 1 1 2008 8 0.00
1 1 1 2008 9 0.00
1 1 1 2008 10 0.00
1 1 1 2008 11 0.00
1 1 1 2008 12 0.00
1 2 1 2008 1 100.00
1 2 1 2008 2 100.00
1 2 1 2008 3 100.00
1 2 1 2008 4 100.00
1 2 1 2008 5 100.00
1 2 1 2008 6 100.00
1 2 1 2008 7 100.00
1 2 1 2008 8 0.00
1 2 1 2008 9 0.00
1 2 1 2008 10 0.00
1 2 1 2008 11 0.00
1 2 1 2008 12 0.00
1 3 1 2008 1 150.00
1 3 1 2008 2 0.00
1 3 1 2008 3 50.00
1 3 1 2008 4 10.00
1 3 1 2008 5 10.00
1 3 1 2008 6 200.00
1 3 1 2008 7 400.00
1 3 1 2008 8 45.00
1 3 1 2008 9 0.00
1 3 1 2008 10 0.00
1 3 1 2008 11 0.00
1 3 1 2008 12 0.00
(36 row(s) affected)
We can filter so that only periods with a non-zero amount are included, and we can SELECT FROM this result set and get the exact data we need for whatever date range we want without worrying which column the actual data is in.
This can also be done with a CROSS JOIN or a UNION ALL, but with 12 values to pivot, those options become much longer to write and
UNPIVOT appears to be the way to go in this case.
Example #4: Multiple unpivots
Finally, let's consider a more complicated example. Here, we have a table that stores games played between two teams, where one team is the
HomeTeam and the other is the
AwayTeam:
create table Teams
(
TeamCode char(3) primary key not null,
TeamName varchar(100) not null
)
create table Games
(
GameDate datetime,
HomeTeam char(3) references Teams(TeamCode),
AwayTeam char(3) references Teams(TeamCode),
HomeScore int,
AwayScore int,
primary key (GameDate, HomeTeam),
constraint pk2 unique (GameDate, AwayTeam),
check (HomeTeam <> AwayTeam)
)
insert into Teams
select 'BOS','Boston Red Sox' union all
select 'NYY','New York Yankees'
insert into Games
select '2008-04-01','BOS','NYY',3,1 union all
select '2008-04-02','BOS','NYY',6,4 union all
select '2008-04-03','BOS','NYY',2,3 union all
select '2008-04-08','NYY','BOS',6,0 union all
select '2008-04-09','NYY','BOS',2,6 union all
select '2008-04-10','NYY','BOS',1,10
GameDate HomeTeam AwayTeam HomeScore AwayScore
----------------------- -------- -------- ----------- -----------
2008-04-01 00:00:00.000 BOS NYY 3 1
2008-04-02 00:00:00.000 BOS NYY 6 4
2008-04-03 00:00:00.000 BOS NYY 2 3
2008-04-08 00:00:00.000 NYY BOS 6 0
2008-04-09 00:00:00.000 NYY BOS 2 6
2008-04-10 00:00:00.000 NYY BOS 1 10
(6 row(s) affected)
That may or may not be the best possible design for the Games table, but it is a common way to do it. With data in that form, if we want to get the total runs scored per team across all games, or get each teams won-loss record, we need to make two passes through the table. This can be done fairly easily and efficiently with a union:
select GameDate,
HomeTeam as TeamCode,
'Home' as HomeOrAway,
HomeScore as Score,
case when HomeScore > AwayScore then 1 else 0 end as Win,
case when HomeScore < AwayScore then 1 else 0 end as Loss,
case when HomeScore = AwayScore then 1 else 0 end as Tie
from
Games
union all
select GameDate,
AwayTeam as TeamCode,
'Away' as HomeOrAway,
AwayScore as Score,
case when HomeScore < AwayScore then 1 else 0 end as Win,
case when HomeScore > AwayScore then 1 else 0 end as Loss,
case when HomeScore = AwayScore then 1 else 0 end as Tie
from
Games
GameDate TeamCode HomeOrAway Score Win Loss Tie
----------------------- -------- ---------- ----------- ----------- ----------- -----------
2008-04-01 00:00:00.000 BOS Home 3 1 0 0
2008-04-02 00:00:00.000 BOS Home 6 1 0 0
2008-04-03 00:00:00.000 BOS Home 2 0 1 0
2008-04-08 00:00:00.000 NYY Home 6 1 0 0
2008-04-09 00:00:00.000 NYY Home 2 0 1 0
2008-04-10 00:00:00.000 NYY Home 1 0 1 0
2008-04-01 00:00:00.000 NYY Away 1 0 1 0
2008-04-02 00:00:00.000 NYY Away 4 0 1 0
2008-04-03 00:00:00.000 NYY Away 3 1 0 0
2008-04-08 00:00:00.000 BOS Away 0 0 1 0
2008-04-09 00:00:00.000 BOS Away 6 1 0 0
2008-04-10 00:00:00.000 BOS Away 10 1 0 0
(12 row(s) affected)
Now, how can we do this with
UNPIVOT? This example is a bit more complicated, because we are "unpivoting" not only the score, but also the TeamCode, and we are also calculating a few extra columns (Win, Loss and Tie).
So, can we
UNPIVOT more than 1 column? Let's start simply and do things one at a time. First, let's
UNPIVOT just the TeamCode:
select GameDate, HomeOrAway, Team
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
GameDate HomeOrAway Team
----------------------- -------------- ----
2008-04-01 00:00:00.000 HomeTeam BOS
2008-04-01 00:00:00.000 AwayTeam NYY
2008-04-02 00:00:00.000 HomeTeam BOS
2008-04-02 00:00:00.000 AwayTeam NYY
2008-04-03 00:00:00.000 HomeTeam BOS
2008-04-03 00:00:00.000 AwayTeam NYY
2008-04-08 00:00:00.000 HomeTeam NYY
2008-04-08 00:00:00.000 AwayTeam BOS
2008-04-09 00:00:00.000 HomeTeam NYY
2008-04-09 00:00:00.000 AwayTeam BOS
2008-04-10 00:00:00.000 HomeTeam NYY
2008-04-10 00:00:00.000 AwayTeam BOS
(12 row(s) affected)
OK, so far so good. Now, how do we get the score for the team as well? Let's add another
UNPIVOT clause to the SELECT, this time for Score:
select GameDate, HomeOrAway, Team, Score
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway in (HomeScore, AwayScore)) as Score
Msg 265, Level 16, State 1, Line 1
The column name "HomeOrAway" specified in the UNPIVOT operator conflicts with the existing column name in the UNPIVOT argument.
Msg 8156, Level 16, State 1, Line 1
The column 'HomeOrAway' was specified multiple times for 'Score'.
Hmmm. OK, we cannot specify the same column for both pivots. That is just an alias specification, no big deal, so let's just alias it as "HomeOrAway2" :
select GameDate, HomeOrAway, Team, Score
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway2 in (HomeScore, AwayScore)) as Score
GameDate HomeOrAway Team Score
----------------------- ---------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam BOS 3
2008-04-01 00:00:00.000 HomeTeam BOS 1
2008-04-01 00:00:00.000 AwayTeam NYY 3
2008-04-01 00:00:00.000 AwayTeam NYY 1
2008-04-02 00:00:00.000 HomeTeam BOS 6
2008-04-02 00:00:00.000 HomeTeam BOS 4
2008-04-02 00:00:00.000 AwayTeam NYY 6
2008-04-02 00:00:00.000 AwayTeam NYY 4
2008-04-03 00:00:00.000 HomeTeam BOS 2
2008-04-03 00:00:00.000 HomeTeam BOS 3
2008-04-03 00:00:00.000 AwayTeam NYY 2
2008-04-03 00:00:00.000 AwayTeam NYY 3
2008-04-08 00:00:00.000 HomeTeam NYY 6
2008-04-08 00:00:00.000 HomeTeam NYY 0
2008-04-08 00:00:00.000 AwayTeam BOS 6
2008-04-08 00:00:00.000 AwayTeam BOS 0
2008-04-09 00:00:00.000 HomeTeam NYY 2
2008-04-09 00:00:00.000 HomeTeam NYY 6
2008-04-09 00:00:00.000 AwayTeam BOS 2
2008-04-09 00:00:00.000 AwayTeam BOS 6
2008-04-10 00:00:00.000 HomeTeam NYY 1
2008-04-10 00:00:00.000 HomeTeam NYY 10
2008-04-10 00:00:00.000 AwayTeam BOS 1
2008-04-10 00:00:00.000 AwayTeam BOS 10
(24 row(s) affected)
Holy schnikies, it works! We can specify more than one
UNPIVOT clause for the same SQL statement! Who would have thunk it?
Uh oh -- wait a second. We have 24 results returned. We should have only 12. Something is not right here. Remember when we aliased our second unpivot column as "HomeOrAway2"? We did not return that anywhere in our results. Let's add that in and take a look:
select GameDate, HomeOrAway, HomeOrAway2, Team, Score
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway2 in (HomeScore, AwayScore)) as Score
GameDate HomeOrAway HomeOrAway2 Team Score
----------------------- ---------- ----------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam HomeScore BOS 3
2008-04-01 00:00:00.000 HomeTeam AwayScore BOS 1
2008-04-01 00:00:00.000 AwayTeam HomeScore NYY 3
2008-04-01 00:00:00.000 AwayTeam AwayScore NYY 1
2008-04-02 00:00:00.000 HomeTeam HomeScore BOS 6
2008-04-02 00:00:00.000 HomeTeam AwayScore BOS 4
2008-04-02 00:00:00.000 AwayTeam HomeScore NYY 6
2008-04-02 00:00:00.000 AwayTeam AwayScore NYY 4
2008-04-03 00:00:00.000 HomeTeam HomeScore BOS 2
2008-04-03 00:00:00.000 HomeTeam AwayScore BOS 3
2008-04-03 00:00:00.000 AwayTeam HomeScore NYY 2
2008-04-03 00:00:00.000 AwayTeam AwayScore NYY 3
2008-04-08 00:00:00.000 HomeTeam HomeScore NYY 6
2008-04-08 00:00:00.000 HomeTeam AwayScore NYY 0
2008-04-08 00:00:00.000 AwayTeam HomeScore BOS 6
2008-04-08 00:00:00.000 AwayTeam AwayScore BOS 0
2008-04-09 00:00:00.000 HomeTeam HomeScore NYY 2
2008-04-09 00:00:00.000 HomeTeam AwayScore NYY 6
2008-04-09 00:00:00.000 AwayTeam HomeScore BOS 2
2008-04-09 00:00:00.000 AwayTeam AwayScore BOS 6
2008-04-10 00:00:00.000 HomeTeam HomeScore NYY 1
2008-04-10 00:00:00.000 HomeTeam AwayScore NYY 10
2008-04-10 00:00:00.000 AwayTeam HomeScore BOS 1
2008-04-10 00:00:00.000 AwayTeam AwayScore BOS 10
(24 row(s) affected)
Notice what is happening here. Without an UNPIVOT, we had 6 rows. We added one UNPIVOT to pivot two values, and got 12 rows. Then, we added another pivot, again with two values, and got back 24. Looking at the results, we can see that the single game at '2008-04-01' was unpivoted into 4 rows. So, while we can unpivot multiple times,
there is no relation between each UNPIVOT. That is, notice there is no correlation between HomeOrAway and HomeOrAway2. Let's look at just that first game:
GameDate HomeOrAway HomeOrAway2 Team Score
----------------------- ---------- ----------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam HomeScore BOS 3
2008-04-01 00:00:00.000 HomeTeam AwayScore BOS 1
2008-04-01 00:00:00.000 AwayTeam HomeScore NYY 3
2008-04-01 00:00:00.000 AwayTeam AwayScore NYY 1
Both scores, home and away, are specified for each team -- even if it is not their matching score. What we really want is only these two lines:
GameDate HomeOrAway HomeOrAway2 Team Score
----------------------- ---------- ----------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam HomeScore BOS 3
2008-04-01 00:00:00.000 HomeTeam AwayScore BOS 1
2008-04-01 00:00:00.000 AwayTeam HomeScore NYY 3
2008-04-01 00:00:00.000 AwayTeam AwayScore NYY 1
How can we get there? Well, there's always a WHERE clause! We can simply filter our result so that the Left 4 characters of HomeOrAway matches the left 4 characters of HomeOrAway2. After all, there is nothing magical about these unpivoted columns, they are data just like anything else.
So, we can now write:
select GameDate, left(HomeOrAway,4) as HomeOrAway, Team, Score
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway2 in (HomeScore, AwayScore)) as Score
where left(HomeOrAway,4) = left(HomeOrAway2,4)
GameDate HomeOrAway Team Score
----------------------- ---------- ---- -----------
2008-04-01 00:00:00.000 Home BOS 3
2008-04-01 00:00:00.000 Away NYY 1
2008-04-02 00:00:00.000 Home BOS 6
2008-04-02 00:00:00.000 Away NYY 4
2008-04-03 00:00:00.000 Home BOS 2
2008-04-03 00:00:00.000 Away NYY 3
2008-04-08 00:00:00.000 Home NYY 6
2008-04-08 00:00:00.000 Away BOS 0
2008-04-09 00:00:00.000 Home NYY 2
2008-04-09 00:00:00.000 Away BOS 6
2008-04-10 00:00:00.000 Home NYY 1
2008-04-10 00:00:00.000 Away BOS 10
(12 row(s) affected)
Now we are back to 12 results returned, the Homes and Aways match up, and things look good! Hey, we just pulled off 2 unpivots at once, not bad! From here, we can add our Win, Loss and Tie calculations as well. Note that we also have removed "HomeOrAway2" from the SELECT, since we don't need to see that column in our result. Finally, we are returning only the left 4 characters of HomeTeam or AwayTeam, so that the word 'Home' or 'Away' is returned in the HomeOrAway column.
This seems like quite a bit of work. Is
UNPIVOT really the way to go in this scenario? Maybe not. I have not done full performance testing of
UNPIVOT versus the other options shown (this article is way too long as it is!) but as always you should fully test your options and decide which offers the best performance and code simplicity.
Another option if you want to unpivot more than 1 column is to just use one
UNPIVOT cause, but then use CASE expressions to return the values you need. For example:
select GameDate, left(HomeOrAway,4) as HomeOrAway,
Team,
case when HomeOrAway='HomeTeam' then HomeScore else AwayScore end as Score
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
GameDate HomeOrAway Team Score
----------------------- ---------- ---- -----------
2008-04-01 00:00:00.000 Home BOS 3
2008-04-01 00:00:00.000 Away NYY 1
2008-04-02 00:00:00.000 Home BOS 6
2008-04-02 00:00:00.000 Away NYY 4
2008-04-03 00:00:00.000 Home BOS 2
2008-04-03 00:00:00.000 Away NYY 3