Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Tuesday, December 23, 2008

The Mailbag: More on Ordering by Distinct Values ...

I promise to get back to writing articles on a more regular basis soon, but in the meantime, here's a comment from Nathan A. on using DISTINCT and ORDER BY:

Jeff,

This is actually a problem I have been puzzling over for quite a while now. I actually need to do that sort. I wonder if I may have to create another column that has the list of ordering values in it in increasing order so for my example above assuming a letter table and a number table that contains the numbers for letters, the number table would not change but I would add this second column to the letter table:

B 0
A 0, 1
C 2
B 3

But I don't like that idea as it is a denormalization and would require extra maintenance.

I can think of a way to do it with a specified number of subsequent rows to sort by. In your example you order by the minimum value of the number column. If we wanted to order by the minimum, then by the second minimum, then by the third minimum we can use nested queries to select each of those values in different columns and order by them. I have developed  query like this but there are some issues. First of all, getting the second and third lowest values requires nested queries themselves so this would require many nested queries (not sure if that is a problem). The number of nested queries increases based on how many levels down you want to sort by. The other problem with this method is that you have to specify how many levels down you want to sort by, you can't just sort by a concatenation of all numbers.

The real world example of this problem actually seems like it would be useful in many situations. Consider a task table, that has a list of tasks for people to accomplish and an assignee table that has a list of people assigned to the task. A task can have many people assigned to it. I want to get a list of tasks sorted by their assignees in alphabetical order so if the joined table looked like this:

Task Assignee
1 John
1 Mark
2 John
3 Mark

The result would be in this order:
2 John
1 John, Mark
3 Mark


Let me know what you think the best approach is.


Hi Nathan --

Well, one way to handle this is to write a User Defined Function that returns a string concatenating distinct Assignees for the Task provided as a parameter.  We can use a UDF similar to this one as an example.  UDF's such as these are the simplest and most efficient way I have seen to handle concatenation at the database layer, though there are other methods you can try.

So, let's say we have this for a schema and sample data:

create table Tasks (TaskID int primary key, TaskName varchar(10))
go

create table TaskAssignees
(   TaskID int references Tasks(TaskID), Assignee varchar(10), primary key (TaskID, Assignee))
go

insert into Tasks
select 1,'Task A' union all
select 2,'Task B' union all
select 3,'Task C'

insert into TaskAssignees
select 1, 'John' union all
select 1, 'Mark' union all
select 2, 'John' union all
select 3, 'Mark' union all
select 3, 'Ed'

We can create a UDF like this:
   
create Function GetTaskAssignees(@TaskID int)
returns varchar(100)
as
    begin
    declare @ret varchar(100)
    set @ret = ''

    select @ret= @ret + ', ' + Assignee
    from TaskAssignees
    where TaskID = @TaskID
    order by Assignee

    return substring(@ret,3,100)
    end

... and get the output you are looking for like this:

select TaskID, TaskName, dbo.GetTaskAssignees(TaskID) as Assignees
from Tasks
order by Assignees

TaskID      TaskName   Assignees
----------- ---------- --------------
3           Task C     Ed, Mark
2           Task B     John
1           Task A     John, Mark

So, this should  actually work for well, though for large sets of data performance may be an issue.  As an added bonus, this handles presentation of the names assigned to each task for you as well.

Depending on the data, however, you may need to concatenate items of a fixed length, padded by spaces, instead of simply comma-separated.   This would apply if you are sorting by numeric values, such as:

1,23,45
1,3
6,12,4

Notice that "1,23,25" sorts before  "1,3" in the example above, since it is just comparing two strings.  To solve this, you'd have to write the UDF to output like this:

1 , 3
1 ,23 ,45
6 ,12 , 4

That way, " 3" (padded to the right with a space) sorts correctly before "23".  You could also pad with leading zeroes, or padding after the value instead of before.  The trick here is to identify how much padding you need.

If you  need assistance with altering the example UDF shown to pad the output, let me know.

posted @ Tuesday, December 23, 2008 9:25 AM | Feedback (5) | Filed Under [ T-SQL Sorting User Defined Functions ]

Powered by:
Powered By Subtext Powered By ASP.NET