Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Top N Percent per Group

Here's a good question in the feedback from my post about using the T-SQL 2005 features to return the Top N per Group of a result set:

Sani writes:

What about Top n Percent per Group??? I would greatly appreciate an input on that as well.

That's a good question, and also easily solvable.  One way to do this that I thought of was by using a combination of rank() and a count(*) partitioned aggregate function, which is also a new SQL Server 2005 feature.

Simply calculate the rank() of each row in the group, and also the count(*) of all rows in the group. Multiply the count(*) by the percentage of rows you want returned and filter so that your rank() per group is less than that.

Here's an example.  Suppose we want to return the newest 10% of all products per region, where the newest product has the latest "AddedDate" column.  We can write it like this:

with ProductsByRegion as
(
    select Region, Product, AddedDate,
       rank() over (partition by Region order by AddedDate desc) as AddedRank,
       count(*) over (partition by Region) as RegionProductsCount
    from Products
)
select
       Region, Product, AddedDate
from
      ProductsByRegion
where
     AddedRank <= (RegionProductsCount * .10)

That's really all there is to it.  

Update: Some great feedback from Geoff N. Hiten in the first comment to this post shows an even easier solution:  just use the NTILE() aggregate function.  See his comment for an example.   I definitely recommend that approach, it is much simpler and certainly more straightforward since that's pretty much what NTILE is designed to do. Thanks Geoff!   (So much for my theory that Jeff's who spell their name with a "J" are always smarter than those who spell it with a "G" !)

These partitioned functions are really amazing and so useful, they are hard to live without once you get the hang of them.

Print | posted on Thursday, February 21, 2008 1:05 PM | Filed Under [ SQL Server 2005 ]

Feedback

Gravatar

# re: Top N Percent per Group

Rank() is good but I prefer ntile() for this purpose

with ProductsByRegion as
(
select Region, Product, AddedDate,
ntile(100) over (partition by Region order by AddedDate desc) as AddedRank,
from Products
)
select
Region, Product, AddedDate
from
ProductsByRegion
where
AddedRank <= 10


The nice bit is that you can adjust it for any percentage desired very easily. It also avoids the messy "ties" issue with rank() and dense_rank().
2/22/2008 2:09 PM | Geoff N. Hiten
Gravatar

# re: Top N Percent per Group

Geoff -- great stuff! I did not think of that -- a much better answer than the one I gave. If you don't mind, I will update the article to point out your comment.

Thanks!
2/22/2008 2:48 PM | Jeff
Gravatar

# re: Top N Percent per Group

Thanks.

I have been doing a lot of custom data extractions lately that use the various paging and ranking functions so I got very familiar with them It gets really fun when you start putting more than one function in an query and use them to find intersecting sets.
2/22/2008 3:15 PM | Geoff N. Hiten
Gravatar

# re: Top N Percent per Group

Thanks for posting this code. I've been trying to convert some SAS code (Proc Rank) into SQL. This code accomplishes exactly what I needed. Perfect!
7/4/2008 11:02 PM | Aaron
Gravatar

# re: Top N Percent per Group

Jeff

Excellent post !
However I think I uncovered a problem/bug with Geoff's ntile solution becuase while testing your procedure with Geoff's using the same data, I got two different results. It happens when there are fewer than 100 rows in a partition. In my case, 26 rows were returned in a particular partition. The "AddedRank" column returned 1-26 for that partition, so using the where clause of "AddedRank <= 10" did not give me the top 10% but rather the top 38% (10/26).

BTW- Enjoy the holidays with your new son Benjamin, time flys by fast and before you know it they are grownup and gone so savor the time now!
12/24/2008 9:31 AM | ChrisW
Gravatar

# re: Top N Percent per Group

what about converting SQL into SAS (Proc Sql)? I've been using ntile function in sql. Do you know any easy way to do the ntiles in SAS, without using 10 pages of macros? Thank you all in advance.

Banu
8/16/2009 8:31 AM | banu
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET