Show me The TOP n Number of things based on a key
OK, another one that seems to be often asked, and I'm sure there's a SQL Team article out there...I'm just lazy. Anyway, this post got me interested in this again (and I always have trouble trying to remember the answer), so I decided to post some solutions, so I don't have to remember. With the Help of Pat Phelan, Rudy Limeback, and Mr. B. Lindman (aka the Blind Dude), we came up with the following.
Notice the differences in the Plans. While Pat's Subquery looks cleaner, Rudy's Join seems more effecient. Any comments on the plans would be greatly appreciated.
EDIT: Sunsande points out some other shortcomings in the SQL. With their proposed SQL you not only get the top n results, but you do get ties. However n top results would require dynamic sql for this. Still I think it looks to be the best solution so far. Thanks
DECLARE @N int, @sql nvarchar(4000)
SET @N = 3 --The number of records to return for each grouping.
SET @sql =
N' SELECT a.SiteId, a.EmpId, a.Sales '
+ ' FROM myTable99 a '
+ ' WHERE a.Sales IN (SELECT DISTINCT TOP ' + CONVERT(varchar(15),@n) + ' b.Sales '
+ ' FROM myTable99 b '
+ ' WHERE a.EmpID = b.EmpId '
+ ' ORDER BY b.sales DESC) '
+ 'ORDER BY a.EmpID, a.Sales DESC, a.SiteID '
EXECUTE sp_executesql @sql
The Question here was, Show my the TOP 3 Sales for an Employee by Store
USE Northwind
GO
SET NOCOUNT ON
CREATE TABLE myTable99(SiteId int, EmpId int, Sales money
, PRIMARY KEY(EmpId, Sales, SiteId))
GO
INSERT INTO myTable99(SiteId, EmpId, Sales)
SELECT 1, 1, 10.00 UNION ALL
SELECT 2, 1, 15.00 UNION ALL
SELECT 3, 1, 20.00 UNION ALL
SELECT 4, 1, 50.00 UNION ALL
SELECT 5, 1, 10.00 UNION ALL
SELECT 6, 1, 5.00 UNION ALL
SELECT 1, 2, 100.00 UNION ALL
SELECT 2, 2, 1500.00 UNION ALL
SELECT 3, 2, 2000.00 UNION ALL
SELECT 4, 2, 5000.00 UNION ALL
SELECT 5, 2, 1000.00 UNION ALL
SELECT 6, 2, 500.00 UNION ALL
SELECT 1, 3, 1.00 UNION ALL
SELECT 2, 3, 1.50 UNION ALL
SELECT 3, 3, 2.00 UNION ALL
SELECT 4, 3, 5.00 UNION ALL
SELECT 5, 3, 1.00 UNION ALL
SELECT 6, 3, .50
GO
SET SHOWPLAN_TEXT ON
GO
DECLARE @N int
SET @N = 3 --The number of records to return for each grouping.
SELECT a.SiteId, a.EmpId, a.Sales
FROM myTable99 a
WHERE (SELECT Count(*)
FROM myTable99 b
WHERE b.EmpId = a.EmpId
AND a.Sales <= b.Sales) <= @N
ORDER BY a.EmpID, a.Sales DESC, a.SiteID
SELECT a.EmpId, a.SiteId, a.sales
FROM myTable99 a
INNER JOIN myTable99 b
ON a.EmpId = b.EmpId
AND a.Sales <= b.Sales
GROUP BY a.EmpId, a.SiteId, a.sales
HAVING COUNT(*) <= @N
ORDER BY a.EmpId, a.sales desc
GO
SET SHOWPLAN_TEXT OFF
GO
SET NOCOUNT OFF
DROP TABLE myTable99
GO
Legacy Comments
Brett
2005-02-11 |
Interesting caviat Seems that if there's any duplicate data that is part of the results, then the query ignores that data...I had the PK on all three fields.I guess it shouldn't have it on Sales..anyway, the fix would be: SELECT a.EmpId, a.SiteId, a.sales FROM (SELECT DISTINCT EmpId, SiteId, sales FROM myTable99) a INNER JOIN (SELECT DISTINCT EmpId, SiteId, sales FROM myTable99) b ON a.EmpId = b.EmpId AND a.Sales <= b.Sales GROUP BY a.EmpId, a.SiteId, a.sales HAVING COUNT(*) <= @N ORDER BY a.EmpId, a.sales desc GO |
sunsande
2005-02-23 |
All of yours solutions are not complete ... Hi, there I also became to that problem while working on a MS Access project. It is beautiful in math or programming sense and I liked it a lot after I solved it :). Assume that the data for the EmpId=1 is : SELECT 1, 1, 50.00 UNION ALL SELECT 2, 1, 50.00 UNION ALL SELECT 3, 1, 50.00 UNION ALL SELECT 4, 1, 50.00 UNION ALL SELECT 5, 1, 10.00 UNION ALL SELECT 6, 1, 5.00 UNION ALL Update yours databases and rerun the queries. Surprise! You don't show any information about EmpID=1 ? Why? Because you assumed implicitly in your queries that an Employee can't make more than N (3 in our case) sales of the same type. The condition HAVING COUNT(*) <= @N won't work with the example above just because there are 4 top sales and 4 is never <= 3. Here is my solution: I assumed to show TOP 3 sales values. In the common case it doesn't mean TOP 3 records but the TOP 3 groups of records. For the example above it will return : 1, 1, 50.00 2, 1, 50.00 3, 1, 50.00 4, 1, 50.00 5, 1, 10.00 6, 1, 5.00 These are the 3 sales groups 50, 10, and 5. Here is the query: SELECT a.SiteId, a.EmpId, a.Sales FROM myTable99 a WHERE a.Sales IN (SELECT DISTINCT TOP 3 b.Sales FROM myTable99 b WHERE a.EmpID = b.EmpId ORDER BY b.sales DESC) ORDER BY a.EmpID, a.Sales DESC, a.SiteID And as a final note: if you want exactly 3 rows per EmpID you can run again TOP N over the groups in the query I suggested. But why shoud we prefer: 1, 1, 50.00 2, 1, 50.00 3, 1, 50.00 instead of 2, 1, 50.00 3, 1, 50.00 4, 1, 50.00 for example. Well this is another topic :). Thank you for BLOGing :)! sunsande (sunsande(aD)gmail<Tot>com) |
sloan
2005-08-19 |
re: Show me The TOP n Number of things based on a key I think the queries are on the same stage as: http://support.microsoft.com/kb/q153747/ HOWEVER, these queries do NOT scale well. If you follow the Microsft Examples (NorthWind database), and then add 1,000,000 Products, the perforamce is horrible (Over 1 Hr). Here is a sql server 2000 script to add 1,000,000 Products. Run it, and then try the KB, and you'll see what I mean. The below codes is NOT production code, its only a hacky way to get 1,000,000 records in the database, with some randomness to them. declare @counter int select @counter = 0 declare @previousMaxID int declare @currentID varchar(12) declare @catID int declare @randomUnitsInStock int while @counter < 994922 begin select @previousMaxID = MAX(ProductID) from Products select @currentID = convert(varchar(12), @previousMaxID + 1) select @catID = @previousMaxID % 8 Select @catID = @catID + 1 Select @randomUnitsInStock = @previousMaxID % 10000 INSERT INTO Products ( ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued ) VALUES ( 'Product_' + @currentID , @catID , @catID , @currentID + 'QuanityPerUnit' , 100.00, @randomUnitsInStock, 100, 100, 0) select @counter = @counter + 1 end |