UNPIVOT: Normalizing data on the fly
Everyone seems to want to "pivot" or "cross tab" data, but knowing how to do the opposite is equally important. In fact, I would argue that the skill of "unpivoting" data is more useful and more important and more relevant to a SQL programmer, since pivoting results in denormalized data, while unpivoting can transform non-normalized data into a normalized result set. We all know that there's lots of bad databases designs out there, so this can be a handy technique to know.
Of course, even a well designed, fully normalized database can still benefit from "unpivoting" from time to time, so let's take a look at some common situations and some of the options we have to handle this at our disposal. We will focus on some traditional SQL techniques to do this, and then take a close look at the UNPIVOT operator that was introduced with SQL Server 2005.
Example #1: A Bad database design
Let's start with a commonly bad table design, in which someone has decided to relate a client to multiple contacts by designing their client table like this:
(
clientID int primary key,
clientName varchar(100),
contact1 int,
contact2 int,
contact3 int,
contact4 int
)
insert into Clients
select 1,'ABC Corp',1,34,2,null union all
select 2,'DEF Foundation',6,2,8,9 union all
select 3,'GHI Inc.',5,9,null,null union all
select 4,'XYZ Industries',24,null,6,null
clientID clientName contact1 contact2 contact3 contact4
----------- -------------------- ----------- ----------- ----------- -----------
1 ABC Corp 1 34 2 NULL
2 DEF Foundation 6 2 8 9
3 GHI Inc. 5 9 NULL NULL
4 XYZ Industries 24 NULL 6 NULL
(4 row(s) affected)
(Note: For brevity, I am not including the contact table here, nor the foreign key constraints. Of course, with this table design, it would probably be pretty unlikely to find such constraints in the database anyway)
With this design, it is not very easy or efficient to get a count of all contacts for each client, or to find out which contacts are related to which clients. One thing we can do, however is to "unpivot" this table in a query that returns 1 row per ClientID/ContactID combination. With that result set, we can easily now reference the table as if it were normalized and we can get the information we need.
One way to do is to use UNION ALL to return each row in the clients table 4 times, and each time return a different contactID column:
from clients
where contact1 is not null
union all
select clientID, contact2 as ContactID
from clients
where contact2 is not null
union all
select clientID, contact3 as ContactID
from clients
where contact3 is not null
union all
select clientID, contact4 as ContactID
from clients
where contact4 is not null
clientID ContactID
----------- -----------
1 1
2 6
3 5
4 24
1 34
2 2
3 9
1 2
2 8
4 6
2 9
(11 row(s) affected)
from
(
select c.clientID,
case n.n when 1 then c.contact1
when 2 then c.contact2
when 3 then c.contact3
when 4 then c.contact4 end as ContactID
from
clients c
cross join
(select 1 as n union all select 2 union all select 3 union all select 4) n
)
x
where
x.ContactID is not null
clientID ContactID
----------- -----------
1 1
1 34
1 2
2 6
2 2
2 8
2 9
3 5
3 9
4 24
4 6
(11 row(s) affected)
Finally, however, there is an even eaiser way to handle this: the UNPIVOT operator, new with SQL 2005. UNPIVOT works very efficiently and really allows you to handle this exact situation quite easily:
from clients
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
clientID ContactID
----------- -----------
1 1
1 34
1 2
2 6
2 2
2 8
2 9
3 5
3 9
4 24
4 6
(11 row(s) affected)
Much shorter to write, and more efficient to execute as well.
Taking a Closer Look at UNPIVOT
The UNPIVOT operator is tricky to get a feel for, however, so let's take a look at it.
First, the "As Contact" at the end is just labeling the entire unpivot result set with an alias, just as you must alias a derived table. Each column returned by the pivot operator can be referenced by the alias if necessary.
The "ContactID for" part says that we want to return a column called "ContactID" for each unpivoted row. The IN() list is the columns that we are unpivoting; the values in the 4 columns listed here will be assigned to the ContactID column in the result. So, the first time a particular row is unpivoted, the value of the 'contact1' column is assigned to ContactID, the next time it is the 'contact2' column, then 'contact3', and then finally 'contact4'. Then, the next row is processed and it all begins again.
Thus, because we are unpivoting 4 values, we know that the result of the unpivot will have 4 times as many rows as the source data.
UNPIVOT returns an additional column as well, which contains the name of the column that was used to produce each unpivoted row. Here, we have specified that to be called ContactNumber. Note that we actually did not return ContactNumber in our example, be we can easily add that in so you can see how it works:
from clients
unpivot (ContactID for ContactNumber in (contact1, contact2,contact3,contact4)) as Contact
clientID ContactNumber ContactID
----------- ----------------------- -----------
1 contact1 1
1 contact2 34
1 contact3 2
2 contact1 6
2 contact2 2
2 contact3 8
2 contact4 9
3 contact1 5
3 contact2 9
4 contact1 24
4 contact3 6
(11 row(s) affected)
So, you can see that the code to write is very short, but a little difficult to grasp at first. In the end, though, we are able to take a bad table design and easily "fix it", at least temporarily, so that we can query it using simple and standard SQL statements to get what we need.
Example #2: Normalizing a Transaction Table
Here's another common example:
(
TranDate datetime,
Account varchar(10),
BudgetAmount money,
ActualAmount money,
ProjectionAmount money,
primary key (TranDate, Account)
)
go
insert into Transactions
select '2008-01-01','0001',354,65,58 union all
select '2008-01-02','0001',14,65,34 union all
select '2008-01-03','0001',0,65,622 union all
select '2008-01-04','0001',9,32,84
go
TranDate Account BudgetAmount ActualAmount ProjectionAmount
----------------------- ---------- --------------------- --------------------- ---------------------
2008-01-01 00:00:00.000 0001 354.00 65.00 58.00
2008-01-02 00:00:00.000 0001 14.00 65.00 34.00
2008-01-03 00:00:00.000 0001 0.00 65.00 622.00
2008-01-04 00:00:00.000 0001 9.00 32.00 84.00
(4 row(s) affected)
Notice that we have different columns for Budget, Actual and Projection, which is not really a great database design. Much better would be to break this data out so that we have a single 'Amount' column and a 'TransactionType' column that specifies the type of each transaction. We can transform our Transactions tables into this format using UNION ALL:
union all
select TranDate, Account, 'ActualAmount' as Type, ActualAmount as Amount from transactions
union all
select TranDate, Account, 'ProjectionAmount' as Type, ProjectionAmount as Amount from transactions
TranDate Account Type Amount
----------------------- ---------- ---------------- ---------------------
2008-01-01 00:00:00.000 0001 BudgetAmount 354.00
2008-01-02 00:00:00.000 0001 BudgetAmount 14.00
2008-01-03 00:00:00.000 0001 BudgetAmount 0.00
2008-01-04 00:00:00.000 0001 BudgetAmount 9.00
2008-01-01 00:00:00.000 0001 ActualAmount 65.00
2008-01-02 00:00:00.000 0001 ActualAmount 65.00
2008-01-03 00:00:00.000 0001 ActualAmount 65.00
2008-01-04 00:00:00.000 0001 ActualAmount 32.00
2008-01-01 00:00:00.000 0001 ProjectionAmount 58.00
2008-01-02 00:00:00.000 0001 ProjectionAmount 34.00
2008-01-03 00:00:00.000 0001 ProjectionAmount 622.00
2008-01-04 00:00:00.000 0001 ProjectionAmount 84.00
(12 row(s) affected)
Or, we can use the UNPIVOT operator to do the same much easier:
from Transactions
unpivot (Amount for Type in (BudgetAmount, ActualAmount, ProjectionAmount)) as Amount
TranDate Account Type Amount
----------------------- --------- ------------------------ ---------------------
2008-01-01 00:00:00.000 0001 BudgetAmount 354.00
2008-01-01 00:00:00.000 0001 ActualAmount 65.00
2008-01-01 00:00:00.000 0001 ProjectionAmount 58.00
2008-01-02 00:00:00.000 0001 BudgetAmount 14.00
2008-01-02 00:00:00.000 0001 ActualAmount 65.00
2008-01-02 00:00:00.000 0001 ProjectionAmount 34.00
2008-01-03 00:00:00.000 0001 BudgetAmount 0.00
2008-01-03 00:00:00.000 0001 ActualAmount 65.00
2008-01-03 00:00:00.000 0001 ProjectionAmount 622.00
2008-01-04 00:00:00.000 0001 BudgetAmount 9.00
2008-01-04 00:00:00.000 0001 ActualAmount 32.00
2008-01-04 00:00:00.000 0001 ProjectionAmount 84.00
(12 row(s) affected)
Example #3: "Unsummarizing" Data
For those who work with accounting systems, this example may be familiar to you. Many times, Accounting systems have "summary" tables that roll up transactional data into a structure like this:
(
CompanyID int,
AccountID int,
TransactionTypeID int,
Year int,
Period1 money,
Period2 money,
Period3 money,
Period4 money,
Period5 money,
Period6 money,
Period7 money,
Period8 money,
Period9 money,
Period10 money,
Period11 money,
Period12 money
)
insert into AccountBalances
select 1,1,1,2008,200,300,400,500,400,0,0,0,0,0,0,0 union all
select 1,2,1,2008,100,100,100,100,100,100,100,0,0,0,0,0 union all
select 1,3,1,2008,150,0,50,10,10,200,400,45,0,0,0,0
(As before, let's not worry about those foreign key constraints)
These tables are often calculated when transactions are posted, or periods are closed. Typically, many reports pull from these tables because it is much more efficient than summarizing thousands or millions of transactions, and the data is already "cross-tabbed" the way most reporting tools would like to display it.
We can take this summarized data and "unpivot" it so we can still access the summarized data, but now it will be in a normalized structure. All it takes is a simple UNPIVOT like this:
CompanyID,
AccountID,
TransactionTypeID,
Year,
substring(Period,7,2) as PeriodNo,
Amount
from
AccountBalances
unpivot
(Amount for Period in (Period1,Period2,Period3,Period4,Period5,Period6,
Period7,Period8,Period9,Period10,Period11,Period12)
) as Amount
CompanyID AccountID TransactionTypeID Year PeriodNo Amount
----------- ----------- ----------------- ----------- -------- ---------------------
1 1 1 2008 1 200.00
1 1 1 2008 2 300.00
1 1 1 2008 3 400.00
1 1 1 2008 4 500.00
1 1 1 2008 5 400.00
1 1 1 2008 6 0.00
1 1 1 2008 7 0.00
1 1 1 2008 8 0.00
1 1 1 2008 9 0.00
1 1 1 2008 10 0.00
1 1 1 2008 11 0.00
1 1 1 2008 12 0.00
1 2 1 2008 1 100.00
1 2 1 2008 2 100.00
1 2 1 2008 3 100.00
1 2 1 2008 4 100.00
1 2 1 2008 5 100.00
1 2 1 2008 6 100.00
1 2 1 2008 7 100.00
1 2 1 2008 8 0.00
1 2 1 2008 9 0.00
1 2 1 2008 10 0.00
1 2 1 2008 11 0.00
1 2 1 2008 12 0.00
1 3 1 2008 1 150.00
1 3 1 2008 2 0.00
1 3 1 2008 3 50.00
1 3 1 2008 4 10.00
1 3 1 2008 5 10.00
1 3 1 2008 6 200.00
1 3 1 2008 7 400.00
1 3 1 2008 8 45.00
1 3 1 2008 9 0.00
1 3 1 2008 10 0.00
1 3 1 2008 11 0.00
1 3 1 2008 12 0.00
(36 row(s) affected)
We can filter so that only periods with a non-zero amount are included, and we can SELECT FROM this result set and get the exact data we need for whatever date range we want without worrying which column the actual data is in.
This can also be done with a CROSS JOIN or a UNION ALL, but with 12 values to pivot, those options become much longer to write and UNPIVOT appears to be the way to go in this case.
Example #4: Multiple unpivots
Finally, let's consider a more complicated example. Here, we have a table that stores games played between two teams, where one team is the HomeTeam and the other is the AwayTeam:
(
TeamCode char(3) primary key not null,
TeamName varchar(100) not null
)
create table Games
(
GameDate datetime,
HomeTeam char(3) references Teams(TeamCode),
AwayTeam char(3) references Teams(TeamCode),
HomeScore int,
AwayScore int,
primary key (GameDate, HomeTeam),
constraint pk2 unique (GameDate, AwayTeam),
check (HomeTeam <> AwayTeam)
)
insert into Teams
select 'BOS','Boston Red Sox' union all
select 'NYY','New York Yankees'
insert into Games
select '2008-04-01','BOS','NYY',3,1 union all
select '2008-04-02','BOS','NYY',6,4 union all
select '2008-04-03','BOS','NYY',2,3 union all
select '2008-04-08','NYY','BOS',6,0 union all
select '2008-04-09','NYY','BOS',2,6 union all
select '2008-04-10','NYY','BOS',1,10
GameDate HomeTeam AwayTeam HomeScore AwayScore
----------------------- -------- -------- ----------- -----------
2008-04-01 00:00:00.000 BOS NYY 3 1
2008-04-02 00:00:00.000 BOS NYY 6 4
2008-04-03 00:00:00.000 BOS NYY 2 3
2008-04-08 00:00:00.000 NYY BOS 6 0
2008-04-09 00:00:00.000 NYY BOS 2 6
2008-04-10 00:00:00.000 NYY BOS 1 10
(6 row(s) affected)
That may or may not be the best possible design for the Games table, but it is a common way to do it. With data in that form, if we want to get the total runs scored per team across all games, or get each teams won-loss record, we need to make two passes through the table. This can be done fairly easily and efficiently with a union:
HomeTeam as TeamCode,
'Home' as HomeOrAway,
HomeScore as Score,
case when HomeScore > AwayScore then 1 else 0 end as Win,
case when HomeScore < AwayScore then 1 else 0 end as Loss,
case when HomeScore = AwayScore then 1 else 0 end as Tie
from
Games
union all
select GameDate,
AwayTeam as TeamCode,
'Away' as HomeOrAway,
AwayScore as Score,
case when HomeScore < AwayScore then 1 else 0 end as Win,
case when HomeScore > AwayScore then 1 else 0 end as Loss,
case when HomeScore = AwayScore then 1 else 0 end as Tie
from
Games
GameDate TeamCode HomeOrAway Score Win Loss Tie
----------------------- -------- ---------- ----------- ----------- ----------- -----
2008-04-01 00:00:00.000 BOS Home 3 1 0 0
2008-04-02 00:00:00.000 BOS Home 6 1 0 0
2008-04-03 00:00:00.000 BOS Home 2 0 1 0
2008-04-08 00:00:00.000 NYY Home 6 1 0 0
2008-04-09 00:00:00.000 NYY Home 2 0 1 0
2008-04-10 00:00:00.000 NYY Home 1 0 1 0
2008-04-01 00:00:00.000 NYY Away 1 0 1 0
2008-04-02 00:00:00.000 NYY Away 4 0 1 0
2008-04-03 00:00:00.000 NYY Away 3 1 0 0
2008-04-08 00:00:00.000 BOS Away 0 0 1 0
2008-04-09 00:00:00.000 BOS Away 6 1 0 0
2008-04-10 00:00:00.000 BOS Away 10 1 0 0
(12 row(s) affected)
Now, how can we do this with UNPIVOT? This example is a bit more complicated, because we are "unpivoting" not only the score, but also the TeamCode, and we are also calculating a few extra columns (Win, Loss and Tie).
So, can we UNPIVOT more than 1 column? Let's start simply and do things one at a time. First, let's UNPIVOT just the TeamCode:
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
GameDate HomeOrAway Team
----------------------- -------------- ----
2008-04-01 00:00:00.000 HomeTeam BOS
2008-04-01 00:00:00.000 AwayTeam NYY
2008-04-02 00:00:00.000 HomeTeam BOS
2008-04-02 00:00:00.000 AwayTeam NYY
2008-04-03 00:00:00.000 HomeTeam BOS
2008-04-03 00:00:00.000 AwayTeam NYY
2008-04-08 00:00:00.000 HomeTeam NYY
2008-04-08 00:00:00.000 AwayTeam BOS
2008-04-09 00:00:00.000 HomeTeam NYY
2008-04-09 00:00:00.000 AwayTeam BOS
2008-04-10 00:00:00.000 HomeTeam NYY
2008-04-10 00:00:00.000 AwayTeam BOS
(12 row(s) affected)
OK, so far so good. Now, how do we get the score for the team as well? Let's add another UNPIVOT clause to the SELECT, this time for Score:
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway in (HomeScore, AwayScore)) as Score
Msg 265, Level 16, State 1, Line 1
The column name "HomeOrAway" specified in the UNPIVOT operator conflicts with the existing column name in the UNPIVOT argument.
Msg 8156, Level 16, State 1, Line 1
The column 'HomeOrAway' was specified multiple times for 'Score'.
Hmmm. OK, we cannot specify the same column for both pivots. That is just an alias specification, no big deal, so let's just alias it as "HomeOrAway2" :
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway2 in (HomeScore, AwayScore)) as Score
GameDate HomeOrAway Team Score
----------------------- ---------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam BOS 3
2008-04-01 00:00:00.000 HomeTeam BOS 1
2008-04-01 00:00:00.000 AwayTeam NYY 3
2008-04-01 00:00:00.000 AwayTeam NYY 1
2008-04-02 00:00:00.000 HomeTeam BOS 6
2008-04-02 00:00:00.000 HomeTeam BOS 4
2008-04-02 00:00:00.000 AwayTeam NYY 6
2008-04-02 00:00:00.000 AwayTeam NYY 4
2008-04-03 00:00:00.000 HomeTeam BOS 2
2008-04-03 00:00:00.000 HomeTeam BOS 3
2008-04-03 00:00:00.000 AwayTeam NYY 2
2008-04-03 00:00:00.000 AwayTeam NYY 3
2008-04-08 00:00:00.000 HomeTeam NYY 6
2008-04-08 00:00:00.000 HomeTeam NYY 0
2008-04-08 00:00:00.000 AwayTeam BOS 6
2008-04-08 00:00:00.000 AwayTeam BOS 0
2008-04-09 00:00:00.000 HomeTeam NYY 2
2008-04-09 00:00:00.000 HomeTeam NYY 6
2008-04-09 00:00:00.000 AwayTeam BOS 2
2008-04-09 00:00:00.000 AwayTeam BOS 6
2008-04-10 00:00:00.000 HomeTeam NYY 1
2008-04-10 00:00:00.000 HomeTeam NYY 10
2008-04-10 00:00:00.000 AwayTeam BOS 1
2008-04-10 00:00:00.000 AwayTeam BOS 10
(24 row(s) affected)
Holy schnikies, it works! We can specify more than one UNPIVOT clause for the same SQL statement! Who would have thunk it?
Uh oh -- wait a second. We have 24 results returned. We should have only 12. Something is not right here. Remember when we aliased our second unpivot column as "HomeOrAway2"? We did not return that anywhere in our results. Let's add that in and take a look:
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway2 in (HomeScore, AwayScore)) as Score
GameDate HomeOrAway HomeOrAway2 Team Score
----------------------- ---------- ----------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam HomeScore BOS 3
2008-04-01 00:00:00.000 HomeTeam AwayScore BOS 1
2008-04-01 00:00:00.000 AwayTeam HomeScore NYY 3
2008-04-01 00:00:00.000 AwayTeam AwayScore NYY 1
2008-04-02 00:00:00.000 HomeTeam HomeScore BOS 6
2008-04-02 00:00:00.000 HomeTeam AwayScore BOS 4
2008-04-02 00:00:00.000 AwayTeam HomeScore NYY 6
2008-04-02 00:00:00.000 AwayTeam AwayScore NYY 4
2008-04-03 00:00:00.000 HomeTeam HomeScore BOS 2
2008-04-03 00:00:00.000 HomeTeam AwayScore BOS 3
2008-04-03 00:00:00.000 AwayTeam HomeScore NYY 2
2008-04-03 00:00:00.000 AwayTeam AwayScore NYY 3
2008-04-08 00:00:00.000 HomeTeam HomeScore NYY 6
2008-04-08 00:00:00.000 HomeTeam AwayScore NYY 0
2008-04-08 00:00:00.000 AwayTeam HomeScore BOS 6
2008-04-08 00:00:00.000 AwayTeam AwayScore BOS 0
2008-04-09 00:00:00.000 HomeTeam HomeScore NYY 2
2008-04-09 00:00:00.000 HomeTeam AwayScore NYY 6
2008-04-09 00:00:00.000 AwayTeam HomeScore BOS 2
2008-04-09 00:00:00.000 AwayTeam AwayScore BOS 6
2008-04-10 00:00:00.000 HomeTeam HomeScore NYY 1
2008-04-10 00:00:00.000 HomeTeam AwayScore NYY 10
2008-04-10 00:00:00.000 AwayTeam HomeScore BOS 1
2008-04-10 00:00:00.000 AwayTeam AwayScore BOS 10
(24 row(s) affected)
Notice what is happening here. Without an UNPIVOT, we had 6 rows. We added one UNPIVOT to pivot two values, and got 12 rows. Then, we added another pivot, again with two values, and got back 24. Looking at the results, we can see that the single game at '2008-04-01' was unpivoted into 4 rows. So, while we can unpivot multiple times, there is no relation between each UNPIVOT. That is, notice there is no correlation between HomeOrAway and HomeOrAway2. Let's look at just that first game:
----------------------- ---------- ----------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam HomeScore BOS 3
2008-04-01 00:00:00.000 HomeTeam AwayScore BOS 1
2008-04-01 00:00:00.000 AwayTeam HomeScore NYY 3
2008-04-01 00:00:00.000 AwayTeam AwayScore NYY 1
Both scores, home and away, are specified for each team -- even if it is not their matching score. What we really want is only these two lines:
----------------------- ---------- ----------- ---- -----------
2008-04-01 00:00:00.000 HomeTeam HomeScore BOS 3
2008-04-01 00:00:00.000 HomeTeam AwayScore BOS 1
2008-04-01 00:00:00.000 AwayTeam HomeScore NYY 3
2008-04-01 00:00:00.000 AwayTeam AwayScore NYY 1
So, we can now write:
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
unpivot (Score for HomeOrAway2 in (HomeScore, AwayScore)) as Score
where left(HomeOrAway,4) = left(HomeOrAway2,4)
GameDate HomeOrAway Team Score
----------------------- ---------- ---- -----------
2008-04-01 00:00:00.000 Home BOS 3
2008-04-01 00:00:00.000 Away NYY 1
2008-04-02 00:00:00.000 Home BOS 6
2008-04-02 00:00:00.000 Away NYY 4
2008-04-03 00:00:00.000 Home BOS 2
2008-04-03 00:00:00.000 Away NYY 3
2008-04-08 00:00:00.000 Home NYY 6
2008-04-08 00:00:00.000 Away BOS 0
2008-04-09 00:00:00.000 Home NYY 2
2008-04-09 00:00:00.000 Away BOS 6
2008-04-10 00:00:00.000 Home NYY 1
2008-04-10 00:00:00.000 Away BOS 10
(12 row(s) affected)
This seems like quite a bit of work. Is UNPIVOT really the way to go in this scenario? Maybe not. I have not done full performance testing of UNPIVOT versus the other options shown (this article is way too long as it is!) but as always you should fully test your options and decide which offers the best performance and code simplicity.
Another option if you want to unpivot more than 1 column is to just use one UNPIVOT cause, but then use CASE expressions to return the values you need. For example:
Team,
case when HomeOrAway='HomeTeam' then HomeScore else AwayScore end as Score
from Games
unpivot (Team for HomeOrAway in (HomeTeam, AwayTeam)) as Team
----------------------- ---------- ---- -----------
2008-04-01 00:00:00.000 Home BOS 3
2008-04-01 00:00:00.000 Away NYY 1
2008-04-02 00:00:00.000 Home BOS 6
2008-04-02 00:00:00.000 Away NYY 4
2008-04-03 00:00:00.000 Home BOS 2
2008-04-03 00:00:00.000 Away NYY 3
2008-04-08 00:00:00.000 Home NYY 6
2008-04-08 00:00:00.000 Away BOS 0
2008-04-09 00:00:00.000 Home NYY 2
2008-04-09 00:00:00.000 Away BOS 6
2008-04-10 00:00:00.000 Home NYY 1
2008-04-10 00:00:00.000 Away BOS 10
(12 row(s) affected)
This is kind of a combination of using UNPIVOT and a CROSS JOIN; the UNPIVOT doubles-up the results returned just as a CROSS JOIN would, and we just use CASE to return the column that we need depending if the current row is a "HomeTeam" or an "AwayTeam" row. This is a fairly good solution and is probably better than using multiple UNPIVOT clauses.
Summary
UNPIVOT is a really handy operator, but it can be tricky to use. You can UNPIVOT more than 1 column at a time, but beware of how it works -- each UNPIVOT uses it's own "pivot values" and you may get many more results than you intend. Thus, you may need to add criteria to your SELECT to ensure that only the results you are looking for are returned. Alternatively, you can use one UNPIVOT along with CASE expressions to achieve the same results.
UNPIVOT is great, but the main thing is that you are at the mercy of the column names; the naming conventions of the columns used becomes data, and that can really complicate things. In general, we want to keep our object names and our data seperate, so keep that limitation in mind if you wish to employ UNPIVOT in your code. In some cases, an old-school UNION ALL or CROSS JOIN may be a preferrable solution.
Legacy Comments
Chris Walker
2008-04-24 |
re: UNPIVOT: Normalizing data on the fly Jeff, A very timely and well written blog entry for me. I was having to 'normalize' a large table that had 2 date fields (entry and withdraw). Initially I just unioned 2 selects to do this. Your unpivot article piqued my interest in trying this in a real scenario. However I ran into a snag when trying to filter the results. The base table has several thousand rows but under 100,000. A simple unpivot yields twice as many rows in the select : select STU_ID as stu_id, CD.CodeDate as codedate,enroll_cd as code from sr_stu_enroll unpivot (CodeDate for datetype in (DT_ENTRY,DT_WDRAW)) as CD I can't seem to place a WHERE clause in this statement to return a subset of the base table. What I am trying to replace is: select STU_ID as stu_id,DT_ENTRY as codedate,wdraw_cd as code from sr_stu_enroll where dt_ENTRY <> '' and DT_ENTRY > '20080301' UNION ALL select STU_ID as stu_id,DT_WDRAW as codedate,wdraw_cd as code from sr_stu_enroll where dt_wdraw <> '' and DT_WDRAW > '20080301' |
Peso
2008-04-24 |
Excellent article but I am missing one thing. The IN part of the UNPIVOT "derived table" are not arbitrary. They are column names belonging to the FROM table. See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=96796 for prefixing column names. I advice to prefix column names in the IN part, because if you accidentally write a column name that does not exist, you will get NULL for all records with this fictional column name and that is very easy to oversee. |
Jeff
2008-04-24 |
re: UNPIVOT: Normalizing data on the fly Thanks, Peso, I don't recall saying that the IN() part was arbitrary. I wrote: >>The IN() list is the columns that we are unpivoting; the values in the 4 columns listed here will be assigned to the ContactID column in the result. However, good point about prefixing column names: it's always a good idea to prefix all column references, even when you only have 1 table in your query. |
Jeff
2008-04-24 |
re: UNPIVOT: Normalizing data on the fly Also -- I tried to put in invalid column names: select clientID from Clients unpivot (ContactID for ContactNumber in (a, b)) as Contact and got an error: Msg 207, Level 16, State 1, Line 3 Invalid column name 'a'. Msg 207, Level 16, State 1, Line 3 Invalid column name 'b'. Can you give an example that accepts invalid column names and just returns NULLS for the unpivot? |
Peso
2008-04-25 |
You are right! I can't reproduce this. So I guess my statement isn't valid anymore. Maybe this was a lesson learnt for me since beta versions? |
Rel
2008-04-29 |
re: UNPIVOT: Normalizing data on the fly Your take on what is normalized and what is not is a bit off. Normalization can only be assessed within the context of the business rules at play. Some of the examples you are calling "bad design" could actually be better than your proposed pivoted versions. For example: TranDate Account BudgetAmount ActualAmount ProjectionAmount ----------------------- ---------- --------------------- --------------------- --------------------- 2008-01-01 00:00:00.000 0001 354.00 65.00 58.00 2008-01-02 00:00:00.000 0001 14.00 65.00 34.00 ... vs. TranDate Account Type Amount ----------------------- --------- ------------------------ --------------------- 2008-01-01 00:00:00.000 0001 BudgetAmount 354.00 2008-01-01 00:00:00.000 0001 ActualAmount 65.00 2008-01-01 00:00:00.000 0001 ProjectionAmount 58.00 Let's add a hypothetical business rule stating that "ActualAmount" must always be greater than or equal to "ProjectionAmount". Which table would you rather implement this constraint on? |
Jeff
2008-04-29 |
re: UNPIVOT: Normalizing data on the fly Rel -- That hypothetical business rule makes no sense whatsoever. Under what scenario would something like that ever exist? If you projected $1000, but only $100 of actual revenue comes in, what happens? You can't book the actual revenue? Or, you have to retroactively change what was projected, losing your history? What happens to your "better" design if we need to store and track multiple budgets? |
Rel
2008-04-29 |
re: UNPIVOT: Normalizing data on the fly Jeff; The business rule proposed wasn't intended to make sense, as that is not a requirement for illustrating the point. You're missing the forest for the trees, here. Let's look at an example based loosely on your first respondent's situation: Students_Orig studentID EntryDate WithdrawalDate ========= --------- -------------- 1 20080101 20080201 2 20080301 20080401 Now we pivot the table: Students_Pivoted studentID DateType DateValue ========= ======== --------- 1 Entry 20080101 1 Withdraw 20080201 2 Entry 20080301 2 Withdraw 20080401 Let's say there's a business rule that the withdrawal date must be equal to or later than the entry date (this, I hope, seems reasonable to you). Again, which table would you rather implement the constraint on? Do you understand that each design could be fully normalized depending on the remainder of the business rules at play? Do you also see how the latter design is a step toward the so-called "EAV" design, which is typically to be avoided? |
Jeff S
2008-04-29 |
re: UNPIVOT: Normalizing data on the fly Rel -- I am not sure what your point is... You said my example was poor, I demonstrated how it was not, and then you said "well, ok, how about this other example?" Yes, I understand that you should not "unpivot" all tables and store everything one attribute per row! Where did I appear to be advocating an EAV design anywhere? Where did I say you need to "unpivot" all tables as much as you can because any time you have more than one attribute on a row it must be a bad design? |
Rel
2008-04-30 |
re: UNPIVOT: Normalizing data on the fly Jeff; You need to re-read my comments for comprehension. I did not say your examples were poor. I was pointing out that the transformations you're proposing are not examples of "normalizing". I then went on to point out that the designs you were calling "better" (i.e., the result of the pivot operations) were not necessarily better, nor more "normalized" than the unpivoted versions. This is a fair point to bring up given the title of your post. >> Where did I appear to be advocating an EAV design anywhere? Here's a quote from your article: >> "Notice that we have different columns for Budget, Actual and Projection, which is not really a great database design." You then go on to propose that >> "Much better would be to break this data out so that we have a single 'Amount' column and a 'TransactionType' column that specifies the type of each transaction." This is, as I stated, a step towards an EAV design. It can introduce significant problems in terms of constraint declaration and query expression. This, of course, is all dependent on the business rules at play and the projected access patterns of your user base. >> Where did I say you need to "unpivot" all tables as much as you can because any time you have more than one attribute on a row it must be a bad design? I never claimed you said anything of the sort. |
JJEugene
2008-05-01 |
re: UNPIVOT: Normalizing data on the fly Great article. The points about normalized vs non-normalized are besides the point. The point is to illustrate how UNPIVOT works and this was done excellently! It also comes at a good time for our agency. My co-worker's database has a timesheet table that looks something like this: EmployeeID, HoursDay1OfMonth, HoursDay2OfMonth, .... HoursDay31OfMonth. I need to show some of that data in my application but in UNPIVOT format where each "day" is listed in a row instead of a column. I vaguely suspected that the UNPIVOT command would work, but didn't have quite enough of a grasp to say for sure. Because of these clear examples that build on each other, I know it will work for our situation and can explain in next Monday's meeting. Thanks! |
Sunkist
2008-06-08 |
re: UNPIVOT: Normalizing data on the fly I have never heard of Unpivot before. This would have made a few things easier over the years if it was a feature in earlier versions of SQL. I will have to keep this in mind with the new stuff I'm doing in SQL 2005 Thanks! |
Chris
2009-01-27 |
re: UNPIVOT: Normalizing data on the fly Jeff, Not sure if you could offer any help but I thought I would ask. I have a table with a set of columns that need to be grouped and the rest need to be unpivoted. I am running into the problem like your example above where the first unpivot works fine but then when I add the second unpivot I get way to many rows returned. It’s a little different though both of the columns are calculated numbers so usage might be something like 32.3 and the price might be 2.01 for a given hour for a given account then for another account I would have something like 24.5 for usage and price might be 1.98. Year Month Day Hour Usage1 usage2 usage3...usage24 Price1 Price2 Price3 ......Price24 2009 1 1 1 32.3 24 43 21 2.01 2.98 3.02 2.22 2009 1 1 1 24.5 12 33 24 1.98 2.56 5.68 4.20 What I need to return is the following Year Month Day Hour Usage Price 2009 1 1 1 56.8 3.99 2009 1 1 2 etc…. This is an example of what I am working on it’s not exact because I am dealing with 48 columns which I thought might be too much for the message…. My code is something like this Select year, month, day, hour, usage,price From ( Sum (usage1) as usage1,Sum (usage2) as usage2,....Sum (usage23) as usage23,Sum (usage24) as usage24, Sum (price1) as price1,Sum (price2) as price2,....Sum (price23) as price23,Sum (price24) as price24) Group by year, month, day,hour) Unpivot(usage for total usage in ( Usage1, Usage2,…...Usage23, Usage24) )as unpvt Unpivot(price for totalprice in ( price1,price2,…...price23, price24) as unpvt2 Thanks Chris |
Bill
2009-02-20 |
re: UNPIVOT: Normalizing data on the fly Nice post, but try not to be so defensive in the comment section all the time. |
Syni
2009-08-20 |
re: UNPIVOT: Normalizing data on the fly the column prefix does not match with a table name or alias name used in the query. I was also working to solve this error Nw i found why it is comming just needs to make small changes in query http://sqlerrormessages.blogspot.com/2009/08/sql-server-error-messages-msg-107.html |
CT
2010-03-19 |
re: UNPIVOT: Normalizing data on the fly I was just given the most bizarre Excel spreadsheet and asked to put it into a database and normalize it - this article saved my day! Written very clearly and very easy to follow! Excellent! Thanks for your time and efforts!! |
Andrea
2010-09-03 |
Thank you: great article about complex UNPIVOT The main goal of this post (UNPIVOT and NOT normalization) is fully achieved! Rating 5/5 |