I got an email from Mr Celko and he correctly stated that my previous solution was not truly working with sets, so he posted me some solutions and proper sample data.
With this information at my hand, I started to investigate what really is needed to get this Relational Division to work properly with two sets; Dividend and Divisor.
Some of you know me well, and know I am not satisfied with just solving the problem. There have to be some tweaks, and I did that too with this solution. Not only is it only touching the Dividend table once and Divisor table once, you can also set if you want a division with no remainder (which means all records in Divisor should match and not a single record more), or allow a division with remainder (which means all the records should match and maybe more records).
Great? Just set 1 for "No remainder" and 0 for "Allow remainder".
Simple as that. So why does it work? Remember your old algebra? "Divide is the same thing as multiply with the inverse number..."
Now for the sample data (courtesy of Mr Celko)
CREATE TABLE dbo.Dividend
(
group_id INTEGER NOT NULL,
item_name VARCHAR(10) NOT NULL,
PRIMARY KEY (
group_id,
item_name
)
)
INSERT INTO dbo.Dividend
(
group_id,
item_name
)
VALUES (1, 'one'),
(1, 'two'),
(1, 'three'),
(1, 'four'),
(2, 'one'),
(2, 'two'),
(2, 'three'),
(3, 'one'),
(3, 'two')
CREATE TABLE dbo.Divisor
(
item_name VARCHAR(10) NOT NULL PRIMARY KEY
)
INSERT INTO dbo.Divisor
(
item_name
)
VALUES ('one'),
('two'),
('three')
Now for the 4 solutions posted by Mr Celko
-- Celko 1
SELECT D1.group_id
FROM Dividend AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY D1.group_id
HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
AND COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Dividend AS D2 WHERE D2.group_id = D1.group_id)
-- Celko 2
SELECT D1.group_id
FROM Dividend AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
AND NOT EXISTS (
SELECT *
FROM Dividend AS D2
WHERE D2.group_id = D1.group_id
AND D2.item_name NOT IN (SELECT item_name FROM Divisor)
)
GROUP BY D1.group_id
HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
-- Celko 3
SELECT D1.group_id
FROM (
SELECT group_id,
item_name,
COUNT(*) OVER (PARTITION BY group_id) AS cnt
FROM Dividend
) AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
AND cnt = (SELECT COUNT(*) FROM Divisor)
GROUP BY D1.group_id
HAVING COUNT(D1.cnt) = (SELECT COUNT(*) FROM Divisor)
--Celko 4
;WITH Divisor2
AS (
SELECT group_id,
MIN(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS single,
SUM(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS full_basket
FROM Dividend
)
SELECT D.group_id
FROM Dividend AS D,
Divisor2
WHERE D.group_id = Divisor2.group_id
AND Divisor2.single = 1
AND Divisor2.full_basket = (SELECT COUNT(*) FROM Divisor)
GROUP BY D.group_id
You can copy and paste the code to a query window and run them. Investigate the execution plan and compare the 4 of them.
And now to my solution.
-- Peso 1
SELECT group_id
FROM (
SELECT t.group_id,
SUM(CASE WHEN t.item_name = n.item_name THEN 1 ELSE 0 END) AS cnt,
COUNT(*) AS Items
FROM dbo.Dividend AS t
CROSS JOIN dbo.Divisor AS n
GROUP BY t.group_id,
t.item_name
) AS d
GROUP BY group_id
HAVING SUM(cnt) = MIN(Items)
AND MIN(cnt) >= 1 -- 1 means no remainder, 0 means remainder
After some challenging with MVP Adam Machanic, here is another version
-- Peso v2
SELECT t.group_id
FROM (
SELECT group_id,
COUNT(*) AS cnt
FROM dbo.Dividend
GROUP BY group_id
) AS kc
INNER JOIN (
SELECT COUNT(*) AS cnt
FROM dbo.Divisor
) AS nc ON nc.cnt = kc.cnt
INNER JOIN dbo.Dividend AS t ON t.group_id = kc.group_id
INNER JOIN dbo.Divisor AS n ON n.item_name = t.item_name
GROUP BY t.group_id
HAVING COUNT(*) = MIN(nc.cnt)
Here is an algorithm (exact division) which is really fast, but not 100% accurate due to the implementation of CHECKSUM (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832).
-- Peso 3
SELECT group_id
FROM (
SELECT group_id,
CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
FROM dbo.Dividend
GROUP BY group_id
) AS t
INNER JOIN (
SELECT CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
FROM dbo.Divisor
) AS n ON n.ca1 = t.ca1
AND n.ca2 = t.ca2
Now copy my solutions and compare them to the other 4.
-- Celko 1
Table 'Dividend'. Scan count 3, logical reads 6.
Table 'Divisor'. Scan count 2, logical reads 20.
-- Celko 2
Table 'Divisor'. Scan count 3, logical reads 32.
Table 'Dividend'. Scan count 4, logical reads 8.
-- Celko 3
Table 'Divisor'. Scan count 3, logical reads 10.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Dividend'. Scan count 1, logical reads 2.
-- Celko 4
Table 'Dividend'. Scan count 2, logical reads 9.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Divisor'. Scan count 1, logical reads 38.
-- Peso 1
Table 'Divisor'. Scan count 1, logical reads 19.
Table 'Dividend'. Scan count 1, logical reads 2.
-- Peso 2
Table 'Divisor'. Scan count 1, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.
-- Peso 3
Table 'Divisor'. Scan count 1, logical reads 2.
Table 'Dividend'. Scan count 1, logical reads 2.
So it seems my solution is cleaner and faster than the previous existing. But the best thing is yet hidden. My solution cares for multi-column division (just expand the CASE and GROUP BY clauses) whereas the previous 4 do not. Well, not easily anyway.
It will involve some replacing for IN with EXISTS, and some string concatenation for the DISTINCT clauses.
//Peso