Introduction
A question I see very often in the
SQLTeam forums is how to return data in a summarized form by concatenating multiple values into single
CSV string columns. For example, taking data like this:
FarmerName FruitName
-------------------- ----------
Farmer Ted Apple
Farmer Ted Orange
Farmer Fred Orange
Farmer Fred Grapes
Farmer Fred Grapefruit
Farmer Jed Orange
... and returning the results from the database in this format:
Farmer FruitList
--------- -------
Farmer Ted Apple,Oranges
Farmer Fred Grapefruit,Grapes,Orange
Farmer Jed Orange
Notice that we are only returning one row per Farmer, and the "FruitList" column returned is a concatenated comma-separated list of values in the Fruit column.
Seems simple enough ... let's take a look. Here's our schema:
create table Farmers
(
FarmerID int primary key,
Name varchar(20)
)
create table Fruit
(
FruitID int primary key,
FruitName varchar(10)
)
create table FarmersFruit
(
FarmerID int,
FruitID int,
primary key (FarmerID, FruitID)
)
And here's our sample data if you'd like to play along at home:
insert into Farmers(FarmerID, Name)
select 1,'Farmer Ted' union all
select 2,'Farmer Fred' union all
select 3,'Farmer Jed'
insert into Fruit(FruitID, FruitName)
select 1,'Apple' union all
select 2,'Orange' union all
select 3,'Grapes' union all
select 4,'Grapefruit'
insert into FarmersFruit (FarmerID, FruitID)
select 1,1 union all
select 1,2 union all
select 2,2 union all
select 2,4 union all
select 2,3 union all
select 3,2
Notice that the data is stored in a nice, simple, normalized manner. We can return all of the Farmers and their Fruit with a stored procedure that consists of a simple SELECT with some JOINS like this:
create procedure FarmerSummary
as
select
F.Name as FarmerName, F.FarmerID, FR.FruitName, FR.FruitID
from
Farmers F
inner join
FarmersFruit FF on F.FarmerID = FF.FarmerID
inner join
Fruit FR on FF.FruitID = FR.FruitID
order by
F.FarmerID, FR.FruitID
Executing FarmerSummary return results with one fruit per row and each Farmer (potentially) repeated multiple times, as is standard in SQL when joining tables in many-to-many relations:
FarmerName FarmerID FruitName FruitID
-------------------- ----------- ---------- -----------
Farmer Ted 1 Apple 1
Farmer Ted 1 Orange 2
Farmer Fred 2 Orange 2
Farmer Fred 2 Grapes 3
Farmer Fred 2 Grapefruit 4
Farmer Jed 3 Orange 2
(6 row(s) affected)
Concatenating Multiple Rows in T-SQL
That is all fine and good, but how do we return only one row per Farmer with all of their fruit together in a single column? Concatenating the fruit names into a string is done typically by creating a simple User-Defined Function:
create function GetFruitList(@FarmerID int)
returns varchar(8000)
as
begin
declare @ret varchar(8000)
set @ret = ''
select @ret = @ret + ',' + F.FruitName
from Fruit F
inner join FarmersFruit FF on F.FruitID = FF.FruitID
where FF.FarmerID = @FarmerID
order by FruitName asc
return substring(@ret,2,8000)
end
The function accepts a FarmerID, and returns a CSV-string of fruit that the farmer "farms". (An exciting example, I know!) There are other ways to get the same result (SQL 2005 has some useful XML features we can use, or we could create a CLR aggregate function), but I find the UDF to be the easiest way to go.
We can use this function to get our desired results by simply calling it for each row in the Farmers table:
create procedure FarmerSummaryList
as
select
f.Name, dbo.GetFruitList(F.FarmerID) as Fruits
from
Farmers f
order by
f.FarmerID
Our fancy new FarmerSummaryList procedure returns the results just the way we'd like to display them:
Name Fruits
-------------------- --------------------------
Farmer Ted Apple,Orange
Farmer Fred Grapefruit,Grapes,Orange
Farmer Jed Orange
(3 row(s) affected)
Presentation versus Data Implications
Ok, so far we have not broken any new ground. I've used this technique before and it works quite well and very efficiently, and I'm sure that maybe you have as well. However, we need to carefully consider what is happening when we employ a function like this and return results in this manner.
Let's say that I am designing a web application, and for the "Farmer Summary" page that shows one row per farmer in a grid, I'd like to show all of the fruit that each farmer is associated with separated by commas. The FarmerSummary stored procedure works great for this, and it allows me to quickly bind the procedure results to a grid in ASP.NET and output everything without the need for anything special.
However, suppose on another page we decide to display the Fruit for each farmer separated by line breaks in the grid, instead of separated by commas? That is, instead of
Apples,Oranges
we want to show:
Apples
Oranges
In HTML terms, we might choose to delimit our list by <BR> tags. What should we do with our FarmerSummaryList stored procedure? We could call it as is, and do a replace on our Front End to replace all "," characters with "<BR>". That would work, but what if items in the list themselves contain commas? (In fact, that is an issue of any approach that builds CSV lists in general -- we must ensure that our chosen delimiter cannot exist in the data, and what should be returned if it does?) We could also create another stored procedure, FarmerSummaryListBR or something like that, that delimits by "<BR>" instead of commas and use that for this specific web page.
The issue here is that suddenly our database code is now becoming dependent on how we wish to format our results. Altering formatting now requires that we edit our SQL code, not just our ASP.NET code. And if we alter our SQL code to do formatting, other applications or clients that consume our data might suddenly not work correctly. Imagine that we have a report that uses the FarmerSummaryList stored procedure -- what happens to our report if we change it to use <BR> tags? Our report will suddenly output those <BR> tags and become a mess.
A More Flexible T-SQL Solution
One way to fix this is to make our code more generic and reusable, and create it so that formatting is not embedded and hard-coded. Instead of writing our user-defined function to always return CSV strings, I would suggest instead to write it like this:
alter function GetFruitList(@FarmerID int, @Delimiter varchar(10))
returns varchar(8000)
as
begin
declare @ret varchar(8000)
set @ret = ''
select @ret = @ret + @Delimiter + F.FruitName
from Fruit F
inner join FarmersFruit FF on F.FruitID = FF.FruitID
where FF.FarmerID = @FarmerID
order by FruitName asc
return substring(@ret,len(@Delimiter + 'x'),8000)
end
Notice that the delimiter is now a parameter; we must pass in what we wish to delimit our string by when calling the function. This allows us to use a comma or a comma plus a space or a "<BR>" tag or anything else when calling the function. We also use a little trick at the end to get the length of the delimiter by appending an 'x' to it, in case the delimiter contains trailing spaces which are not counted by the len() function.
With this new UDF, we can then alter our stored procedure to also accept a delimiter which is simply passed to the function:
alter procedure FarmerSummaryList @Delimiter varchar(10)
as
select
f.Name, dbo.GetFruitList(F.FarmerID,@Delimiter) as Fruits
from
Farmers f
order by
f.FarmerID
Now, when we can call our stored procedure we can specify the delimiter to use to get the exact output we need:
FarmerSummaryList '<br>'
Name Fruits
-------------------- -----------------------------
Farmer Ted Apple<br>Orange
Farmer Fred Grapefruit<br>Grapes<br>Orange
Farmer Jed Orange
(3 row(s) affected)
Well, it seems like we solved our little dilemma; we should be good to!
Encountering Limits and Issues with the T-SQL Approach
Suppose now that we are happy with what we've done so far, but then we realize that our web application needs to let users click on each fruit name in our "FruitList" string to jump to the fancy "Show Fruit" page, passing the corresponding FruitID as a URL parameter. This of course is very typical of a nicely hyper-linked web application. Our current stored procedure, despite the added flexibility of our @Delimiter parameter, will not help us with this. What should we do here? Should we make it much more complicated and add in format strings and other options? Should we concatenate
both the names and the FruitID values into our string somehow, and then parse and extract those values in our application? If so, what format should we use to do this? Also, what if we want a "fruit description" to show when you hover over the link via the
title attribute, or use other values from the Fruit table in a similar manner? Perhaps we need to come up with more complex strings embedding all of this data like this:
Apple,1,A delicious treat;Orange,2,Oranges from Florida;Grapes,3,A Whole Bunch Of Them
Hmmm ... Now things are getting a bit more complicated, and once again we find that our output and formatting is getting mixed up with our database code. We shouldn't need to concatenate and create string formats using our nice clean raw data only to then parse those strings at our client to extract back out that raw data! That's almost as bad as working with a poorly designed unnormalized database in the first place, with CSV columns and things like that. Isn't there an easier way to do this?
Well, actually, there is. And it's not very exciting for all of the talented T-SQL programmers out there. In fact, it's quite boring:
Just use the simple, basic FarmerSummary stored procedure. And do all formatting on the web page in ASP.NET.
That's it.
Format At Your Presentation Layer, Not Your Database Layer
Suppose that instead of writing UDFs and worrying about exactly how to return multiple concatenated values in a single string per row from the database, we simply do the following in our ASP.NET code:
- Retrieve the clean, normalized, raw data from the FarmerSummary stored procedure with a DataReader
- Create a new DataTable, with string columns for FarmerName and FruitList
- Loop through the rows in the DataReader, concatenating together a string of HTML hyperlinks for each fruit (No formatting or parsing is require to extract Fruit names or ID's)
- Each time a new farmer is encountered, finish off the current HTML string, add a row to the return DataTable, and begin a new HTML string
- When done, add the last HTML string and FamerID to the return DataTable
- Bind the DataGrid or Repeater or whatever you are using to display the data to the newly created DataTable
We are essentially doing in our loop what our user defined function is doing, only now we are doing it a the client, so incorporating specific formatting into our process makes sense. We can change our formatting any way we want -- all at the client; we can add
class or
target attributes to our hyperlink, alter the base address, and so on, all without changing any database code. And we can output those values separated by commas, or <BR> tags, or in an unordered list, as well.
Here is some sample code demonstrating this technique for our FruitSummary stored procedure. It just returns the modified, summarized DataTable as indicated above:
public static DataTable GetFarmerSummary()
{
SqlConnection cn = new SqlConnection( <your connection here> );
SqlCommand cm = new SqlCommand("FarmerSummary", cn);
cm.CommandType= CommandType.StoredProcedure;
cn.Open();
SqlDataReader dr = cm.ExecuteReader();
string list = "";
DataTable ret = new DataTable();
DataRow row = null;
ret.Columns.Add("FarmerID",typeof(int));
ret.Columns.Add("FarmerName", typeof(String));
ret.Columns.Add("FruitList", typeof(String));
while (dr.Read())
{
if ((row == null) || ((int) row["FarmerID"] != (int) dr["FarmerID"]))
{
if (row != null)
row["FruitList"] = list;
list = "";
row = ret.Rows.Add(new object[] {dr["FarmerID"], dr["FarmerName"], list});
}
list += String.Format("<a href='Fruit.aspx?ID={0}'>{1}</a><br>", dr["FruitID"],dr["FruitName"]);
}
row["FruitList"] = list;
dr.Close();
cn.Close();
return ret;
}
Notice that there really isn't much to it; most of the code is just initializing things, opening the connection, creating the return DataTable, and so on. The loop is very basic and is easily alterable to build the "FruitList" any way that you want, as shown. You could use a StringBuilder if necessary, though for very short lists it may not be worth it. We can call our FarmerSummary stored procedure over and over, many different ways, each time formatting and outputting the results differently, and all we need is one stored procedure and only our client code needs to change.
Note that is only one approach; there are hundreds of ways to do the same thing in ASP.NET. We can create custom controls and classes to do this more generically, and we could also trap events on our web page to hide or cancel the creation of specific items in repeaters or lists or grids and format things just the way we need.
Conclusion
So, remember: Avoid putting presentation code in your database layer whenever possible. You may be doing it without even realizing it, and returning CSV lists from your database is a great example. By simply returning clean and normalized data to your clients, you make it much easier for those clients to consume that data and to ultimately present and output it any way that they wish.
Finally, you may remember that
I recommend this exact same approach when pivoting data: simply write clean, normalized, basic SQL, and do your formatting/pivoting at the client. It's easier, it's faster, it's more flexible, it's the way to go.
see also: