Here's a good question in the feedback from my post about using the T-SQL 2005 features to return the Top N per Group of a result set:
What about Top n Percent per Group??? I would greatly appreciate an input on that as well.
That's a good question, and also easily solvable. One way to do this that I thought of was by using a combination of rank() and a count(*) partitioned aggregate function
, which is also a new SQL Server 2005 feature.
Simply calculate the rank() of each row in the group, and also the count(*) of all rows in the group. Multiply the count(*) by the percentage of rows you want returned and filter so that your rank() per group is less than that.
Here's an example. Suppose we want to return the newest 10% of all products per region, where the newest product has the latest "AddedDate" column. We can write it like this:
with ProductsByRegion as
select Region, Product, AddedDate,
rank() over (partition by Region order by AddedDate desc) as AddedRank,
count(*) over (partition by Region) as RegionProductsCount
Region, Product, AddedDate
AddedRank <= (RegionProductsCount * .10)
That's really all there is to it.
Some great feedback from Geoff N. Hiten
in the first comment to this post shows an even easier solution: just use the NTILE() aggregate function. See his comment for an example. I definitely recommend that approach, it is much simpler and certainly more straightforward since that's pretty much what NTILE is designed to do. Thanks Geoff! (So much for my theory that Jeff's who spell their name with a "J" are always smarter than those who spell it with a "G" !)
These partitioned functions are really amazing and so useful, they are hard to live without once you get the hang of them.