Thinking outside the box

Patron Saint of Lost Yaks

Proper Relational Division With Sets

I got an email from Mr Celko and he correctly stated that my previous solution was not truly working with sets, so he posted me some solutions and proper sample data.
With this information at my hand, I started to investigate what really is needed to get this Relational Division to work properly with two sets; Dividend and Divisor.

Some of you know me well, and know I am not satisfied with just solving the problem. There have to be some tweaks, and I did that too with this solution. Not only is it only touching the Dividend table once and Divisor table once, you can also set if you want a division with no remainder (which means all records in Divisor should match and not a single record more), or allow a division with remainder (which means all the records should match and maybe more records).
Great? Just set 1 for "No remainder" and 0 for "Allow remainder".

Simple as that. So why does it work? Remember your old algebra? "Divide is the same thing as multiply with the inverse number..."

Now for the sample data (courtesy of Mr Celko)

CREATE TABLE    dbo.Dividend
(
group_id INTEGER NOT NULL,
item_name VARCHAR(10) NOT NULL,
PRIMARY KEY (
group_id,
item_name
)
)

INSERT INTO     dbo.Dividend
(
group_id,
item_name
)
VALUES          (1, 'one'),
(1, 'two'),
(1, 'three'),
(1, 'four'),
(2, 'one'),
(2, 'two'),
(2, 'three'),
(3, 'one'),
(3, 'two')

CREATE TABLE    dbo.Divisor
(
item_name VARCHAR(10) NOT NULL PRIMARY KEY
)

INSERT INTO     dbo.Divisor
(
item_name
)
VALUES          ('one'),
('two'),
('three')

Now for the 4 solutions posted by Mr Celko

-- Celko 1
SELECT      D1.group_id
FROM        Dividend AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY    D1.group_id
HAVING      COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
AND COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Dividend AS D2 WHERE D2.group_id = D1.group_id)

-- Celko 2
SELECT      D1.group_id
FROM        Dividend AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
AND NOT EXISTS  (
SELECT  *
FROM    Dividend AS D2
WHERE   D2.group_id = D1.group_id
AND D2.item_name NOT IN (SELECT item_name FROM Divisor)
)
GROUP BY    D1.group_id
HAVING      COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)

-- Celko 3
SELECT      D1.group_id
FROM        (
SELECT  group_id,
item_name,
COUNT(*) OVER (PARTITION BY group_id) AS cnt
FROM    Dividend
) AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
AND cnt = (SELECT COUNT(*) FROM Divisor)
GROUP BY    D1.group_id
HAVING      COUNT(D1.cnt) = (SELECT COUNT(*) FROM Divisor)

--Celko 4
;WITH Divisor2
AS (
SELECT  group_id,
MIN(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS single,
SUM(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS full_basket
FROM    Dividend
)
SELECT      D.group_id
FROM        Dividend AS D,
Divisor2
WHERE       D.group_id = Divisor2.group_id
AND Divisor2.single = 1
AND Divisor2.full_basket = (SELECT COUNT(*) FROM Divisor)
GROUP BY    D.group_id

You can copy and paste the code to a query window and run them. Investigate the execution plan and compare the 4 of them.
And now to my solution.

-- Peso 1
SELECT      group_id
FROM        (
SELECT      t.group_id,
SUM(CASE WHEN t.item_name = n.item_name THEN 1 ELSE 0 END) AS cnt,
COUNT(*) AS Items
FROM        dbo.Dividend AS t
CROSS JOIN  dbo.Divisor AS n
GROUP BY    t.group_id,
t.item_name
) AS d
GROUP BY    group_id
HAVING      SUM(cnt) = MIN(Items)
AND MIN(cnt) >= 1    -- 1 means no remainder, 0 means remainder

After some challenging with MVP Adam Machanic, here is another version

-- Peso v2
SELECT      t.group_id
FROM        (
SELECT      group_id,
COUNT(*) AS cnt
FROM        dbo.Dividend
GROUP BY    group_id
) AS kc
INNER JOIN  (
SELECT  COUNT(*) AS cnt
FROM    dbo.Divisor
) AS nc ON nc.cnt = kc.cnt
INNER JOIN  dbo.Dividend AS t ON t.group_id = kc.group_id
INNER JOIN  dbo.Divisor AS n ON n.item_name = t.item_name
GROUP BY    t.group_id
HAVING      COUNT(*) = MIN(nc.cnt)

Here is an algorithm (exact division) which is really fast, but not 100% accurate due to the implementation of CHECKSUM (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832).

-- Peso 3
SELECT
group_id
FROM        (
SELECT      group_id,
CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
FROM        dbo.Dividend
GROUP BY    group_id
) AS t
INNER JOIN  (
SELECT  CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
FROM    dbo.Divisor
) AS n ON n.ca1 = t.ca1
AND n.ca2 = t.ca2

Now copy my solutions and compare them to the other 4.

-- Celko 1
Table 'Dividend'. Scan count 3, logical reads 6.
Table 'Divisor'. Scan count 2, logical reads 20.

-- Celko 2
Table 'Divisor'. Scan count 3, logical reads 32.
Table 'Dividend'. Scan count 4, logical reads 8.

-- Celko 3
Table 'Divisor'. Scan count 3, logical reads 10.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Celko 4
Table 'Dividend'. Scan count 2, logical reads 9.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Divisor'. Scan count 1, logical reads 38.

-- Peso 1
Table 'Divisor'. Scan count 1, logical reads 19.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Peso 2
Table 'Divisor'. Scan count 1, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.

-- Peso 3

Table 'Divisor'. Scan count 1, logical reads 2.
Table 'Dividend'. Scan count 1, logical reads 2.

So it seems my solution is cleaner and faster than the previous existing. But the best thing is yet hidden. My solution cares for multi-column division (just expand the CASE and GROUP BY clauses) whereas the previous 4 do not. Well, not easily anyway.
It will involve some replacing for IN with EXISTS, and some string concatenation for the DISTINCT clauses.

//Peso

Print | posted on Friday, July 02, 2010 12:48 AM | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 SQL Server 2000 ]

#re: Proper Relational Division With Sets

Hmm, might want to check your solution with a few more rows before you start the celebration...

-- Prepare sample data
CREATE TABLE Sample
(
ParentID INT NOT NULL,
Keyword VARCHAR(25) NOT NULL,
PRIMARY KEY (Keyword, ParentId)
)

-- Populate sample data
INSERT Sample
(
ParentID,
Keyword
)
select
a.number,
convert(varchar, b.number)
from master..spt_values a
inner join master..spt_values b on
a.number >= b.number
where
a.type = 'p'
and b.type = 'p'

CREATE TABLE Sample2
(
KeyID INT NOT NULL,
Keyword VARCHAR(25) NOT NULL,
PRIMARY KEY (Keyword, KeyId)
)

INSERT Sample2
select
a.number,
convert(varchar, b.number)
from master..spt_values a
inner join master..spt_values b on
a.number >= b.number
where
a.type = 'p'
and b.type = 'p'

SELECT ParentId, keyid
FROM (
SELECT t.ParentId,
n.KeyId,
SUM(CASE WHEN t.keyword = n.keyword THEN 1 ELSE 0 END) AS cnt,
COUNT(*) AS Items
FROM Sample AS t
CROSS JOIN Sample2 AS n
GROUP BY t.ParentId, n.keyid,
t.keyword
) AS d
GROUP BY parentid, keyid
HAVING SUM(cnt) = MIN(Items)
AND MIN(cnt) >= 1
7/2/2010 5:41 PM | Adam Machanic

#re: Proper Relational Division With Sets

Here's a solution that returns the results in a decent amount of time given the above sample data (~56 seconds on my end, compared with killing yours after over 20 minutes):

SELECT DISTINCT
ParentId,
KeyId
FROM
(
SELECT
s.ParentId,
s2.KeyId,
sum(case when s2.keyid is null then 1 else 0 end) over (partition by s.parentid) as ksum,
sum(case when s.parentid is null then 1 else 0 end) over (partition by s2.keyid) as psum
FROM
(
SELECT
parentId,
keyword,
count(*) over (partition by parentId) as parentCount
FROM sample
) as s
full outer join
(
SELECT
keyId,
keyword,
count(*) over (partition by keyid) as keyCount
FROM Sample2
) as s2 on
s2.keyword = s.keyword
and s2.keyCount = s.parentCount
) as x
WHERE
ksum = 0
AND psum = 0
7/2/2010 8:35 PM | Adam Machanic

#re: Proper Relational Division With Sets

What function does KeyID have?
7/2/2010 8:48 PM |

#re: Proper Relational Division With Sets

Oh, the KeyID is meant as a "SetID" for multiple sets of divisors?
Ok, this code below runs in 10 seconds on my laptop, versus 105 seconds for your suggestion above with same sample data.

SELECT t.ParentID,
d.KeyID
FROM (
SELECT ParentID,
COUNT(*) AS cnt
FROM dbo.Sample
GROUP BY ParentID
) AS kc
INNER JOIN (
SELECT KeyID,
COUNT(*) AS cnt
FROM dbo.Sample2
GROUP BY KeyID
) AS nc ON nc.cnt = kc.cnt
INNER JOIN dbo.Sample AS t ON t.ParentID = kc.ParentID
INNER JOIN dbo.Sample2 AS d ON d.Keyword = t.Keyword
AND d.KeyID = nc.KeyID
GROUP BY t.ParentID,
d.KeyID
HAVING COUNT(*) = MIN(nc.cnt)

7/2/2010 9:42 PM |

#re: Proper Relational Division With Sets

Looks good to me -- faster on this end, too.
7/2/2010 10:33 PM | Adam Machanic

#re: Proper Relational Division With Sets

Great.
If you want to allow divisions with remainder, just change "AS nc ON nc.cnt = kc.cnt" to "AS nc ON nc.cnt <= kc.cnt". This new version also allows for multi-column division.

Thank you for pointing out the flaw in ny first solution when using megasize sample data.
Now I think my version 2 deals acceptable with all sizes of sample data.
7/2/2010 11:12 PM |

#re: Exact Division

Try.

--ms65g 1
SELECT *
FROM (SELECT DISTINCT group_id
FROM Dividend) D
WHERE NOT EXISTS
(SELECT *
FROM Divisor D1
FULL OUTER JOIN
(SELECT item_name
FROM Dividend
WHERE group_id = D.group_id) D2
ON D1.item_name = D2.item_name
WHERE D1.item_name IS NULL
OR D2.item_name IS NULL) ;

--ms65g 2
SELECT *
FROM (SELECT DISTINCT group_id
FROM Dividend) D
WHERE NOT EXISTS
(
SELECT * FROM
(
SELECT item_name
FROM Divisor
UNION ALL
SELECT DISTINCT item_name
FROM Dividend
WHERE group_id = D.group_id
)d
GROUP BY d.item_name
HAVING COUNT(item_name) = 1
) ;

--ms65g 3
SELECT *
FROM (SELECT DISTINCT group_id
FROM Dividend) D
WHERE NOT EXISTS
(
SELECT item_name
FROM Divisor
EXCEPT
SELECT item_name
FROM Dividend
WHERE group_id = D.group_id
)
AND NOT EXISTS
(
SELECT item_name
FROM Dividend
WHERE group_id = D.group_id
EXCEPT
SELECT item_name
FROM Divisor
) ;

7/4/2010 2:00 AM | ms65g

#re: Proper Relational Division With Sets

New ideas are always welcome!

Peso-2 (15% of batch)
Table 'Dividend'. Scan count 2, logical reads 4.
Table 'Divisor'. Scan count 1, logical reads 8.

Msg65-1 (29% of batch)
Table 'Dividend'. Scan count 4, logical reads 8.
Table 'Divisor'. Scan count 1, logical reads 7.

Msg65-2 (36% of batch)
Table 'Dividend'. Scan count 4, logical reads 8.
Table 'Divisor'. Scan count 1, logical reads 7.

Msg65-3 (20% of batch)
Table 'Dividend'. Scan count 3, logical reads 22.
Table 'Divisor'. Scan count 2, logical reads 15.
7/4/2010 2:14 AM |

#re: Proper Relational Division With Sets

-- Peso 1 (4% of batch)
Table 'Divisor'. Scan count 1, logical reads 19.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Peso 2 (7% of batch)
Table 'Divisor'. Scan count 1, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.

-- Celko 1 (7% of batch)
Table 'Dividend'. Scan count 3, logical reads 6.
Table 'Divisor'. Scan count 2, logical reads 20.

-- Celko 2 (9% of batch)
Table 'Divisor'. Scan count 3, logical reads 53.
Table 'Dividend'. Scan count 9, logical reads 18.

-- Celko 3 (7% of batch)
Table 'Divisor'. Scan count 3, logical reads 10.
Table 'Worktable'. Scan count 3, logical reads 34.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Celko 4 (10% of batch)
Table 'Dividend'. Scan count 2, logical reads 9.
Table 'Worktable'. Scan count 3, logical reads 34.
Table 'Divisor'. Scan count 1, logical reads 38.

-- Msg65 1 (14% of batch)
Table 'Dividend'. Scan count 4, logical reads 8.
Table 'Divisor'. Scan count 1, logical reads 7.

-- Msg65 2 (16% of batch)
Table 'Dividend'. Scan count 4, logical reads 8.
Table 'Divisor'. Scan count 1, logical reads 7.

-- Msg65 3 (9% of batch)
Table 'Divisor'. Scan count 2, logical reads 15.
Table 'Dividend'. Scan count 3, logical reads 22.

Table 'Worktable'. Scan count 9, logical reads 73.
Table 'Dividend'. Scan count 1, logical reads 2.
Table 'Divisor'. Scan count 1, logical reads 2.
7/4/2010 2:57 AM |

#re: Proper Relational Division With Sets

Try it also.

--Ms65g 4
SELECT group_id
FROM
(
SELECT group_id, COUNT(*) AS cnt
FROM
(
SELECT group_id, item_name
FROM Dividend
UNION ALL
(
SELECT group_id, item_name
FROM (SELECT DISTINCT group_id
FROM Dividend) D
CROSS JOIN
Divisor
)
) D
GROUP BY group_id, item_name
) D
GROUP BY group_id
HAVING MAX(CASE WHEN cnt = 1 THEN 1 ELSE 0 END) = 0 ;
7/4/2010 10:36 AM | ms65g

#re: Proper Relational Division With Sets

--=== Ms65g 4 ====--
Table 'Divisor'. Scan count 1, logical reads 7
Table 'Dividend'. Scan count 2, logical reads 4

--=== Peso 2 ====--
Table 'Divisor'. Scan count 1, logical reads 8
Table 'Dividend'. Scan count 2, logical reads 4
7/4/2010 12:11 PM | ms65g

#re: Proper Relational Division With Sets

Also we can improve the "Celko 2" solution with using noncorrelated version (relational algebra) like this:

=== Ms65g 5 ===
SELECT D1.group_id
FROM Dividend AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY D1.group_id
HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)

EXCEPT

SELECT group_id
FROM Dividend
WHERE item_name NOT IN
(SELECT item_name
FROM Divisor) ;
7/4/2010 12:22 PM | ms65g

#re: Proper Relational Division With Sets

=== Ms65g 6 ===
SELECT D1.group_id
FROM Dividend D1
LEFT OUTER JOIN
Divisor D2
ON D1.item_name = D2.item_name
GROUP BY D1.group_id
HAVING COUNT(D1.item_name) = (SELECT COUNT(*) FROM Divisor)
AND MAX(CASE WHEN D2.item_name IS NULL THEN 1 ELSE 0 END) = 0 ;
=============
7/4/2010 12:44 PM | ms65g

#re: Proper Relational Division With Sets

This solution is completely equivalents with "Ms65g 5" approach by it using correlated subquery instead of EXCEPT.

==== Ms65g 5 correlated_version =========
SELECT D1.group_id
FROM Dividend AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY D1.group_id
HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
AND NOT EXISTS
(SELECT *
FROM Dividend
WHERE group_id = D1.group_id
AND item_name NOT IN
(SELECT item_name
FROM Divisor)) ;
=================================
7/4/2010 1:28 PM | ms65g

#re: Proper Relational Division With Sets

-- Ms65g 5
Table 'Divisor'. Scan count 3, logical reads 28.
Table 'Dividend'. Scan count 3, logical reads 6.

-- Ms65g 5 correlated
Table 'Divisor'. Scan count 3, logical reads 28.
Table 'Dividend'. Scan count 3, logical reads 6.

-- Ms65g 6
Table 'Divisor'. Scan count 2, logical reads 21.
Table 'Dividend'. Scan count 1, logical reads 2.

7/4/2010 2:32 PM |

#re: Proper Relational Division With Sets

I tested your seemingly best query performance wise against Peso 2 on the large sample set provided by MVP Adam Machanic. I deleted all records but KeyID 1234 from divisor before starting.

Here are the results

-- Ms65g 4
SQL Server Execution Times:
CPU time = 14337 ms, elapsed time = 7692 ms.

-- Peso 2
Table 'Sample'. Scan count 1238, logical reads 21802.
Table 'Sample2'. Scan count 4, logical reads 2747.
Table 'Worktable'. Scan count 0, logical reads 0.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 1389 ms, elapsed time = 718 ms.
7/4/2010 2:51 PM |

#re: Proper Relational Division With Sets

Congratulations Peso! You are a professional expert in creating most efficient queries.
Thanks for community. I like this topic.
7/4/2010 3:50 PM | ms65g

#re: Proper Relational Division With Sets

And for completeness, this are the timings for Celko's solutions on the large sample set for KeyID 1234 only.

-- Celko 1
Table 'Sample2'. Scan count 4, logical reads 2747.
Table 'Worktable'. Scan count 0, logical reads 0.
Table 'Sample'. Scan count 1238, logical reads 21802.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 1997 ms, elapsed time = 1062 ms.

-- Celko 2
Table 'Worktable'. Scan count 0, logical reads 0.
Table 'Sample'. Scan count 1236, logical reads 21588.
Table 'Sample2'. Scan count 3, logical reads 3858.
SQL Server Execution Times:
CPU time = 6053 ms, elapsed time = 7592 ms.

-- Celko 3
Table 'Sample2'. Scan count 5, logical reads 4033.
Table 'Sample'. Scan count 3, logical reads 9024.
Table 'Worktable'. Scan count 6, logical reads 4243392.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 14929 ms, elapsed time = 8875 ms.

-- Celko 4
Table 'Sample2'. Scan count 6, logical reads 4208.
Table 'Sample'. Scan count 6, logical reads 18084.
Table 'Worktable'. Scan count 6, logical reads 4235906.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 21217 ms, elapsed time = 12610 ms.

-- Peso 2
Table 'Sample2'. Scan count 4, logical reads 2747.
Table 'Worktable'. Scan count 0, logical reads 0.
Table 'Sample'. Scan count 1238, logical reads 21802.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 1388 ms, elapsed time = 715 ms.
7/4/2010 8:55 PM |

#re: Proper Relational Division With Sets

I am back, with a pure relational algebra approach. But I do not think this method be practical.

;WITH [Distinct] AS
(SELECT DISTINCT group_id
FROM Dividend
),
[A*B] AS
(SELECT *
FROM [Distinct]
CROSS JOIN
Divisor
)
SELECT group_id
FROM [Distinct]

EXCEPT

SELECT group_id
FROM
(
(SELECT *
FROM Dividend
EXCEPT
SELECT *
FROM [A*B])

UNION

(SELECT *
FROM [A*B]
EXCEPT
SELECT *
FROM Dividend)
) D ;
7/4/2010 10:16 PM | ms65g

#re: Proper Relational Division With Sets

Wait…!
I am back, with another pure relational algebra approach.
Previous solution calls “Ms65g 7” and current solution call “Ms65g 8”

--Ms65g 8
;WITH [Distinct] AS
(
SELECT DISTINCT group_id
FROM Dividend
)

SELECT group_id
FROM [Distinct]

EXCEPT

(
SELECT COALESCE(D1.group_id, D2.group_id)
FROM Dividend D1
FULL OUTER JOIN
(SELECT *
FROM [Distinct]
CROSS JOIN
Divisor
) D2
ON D1.group_id = D2.group_id
AND D1.item_name = D2.item_name
WHERE D1.group_id IS NULL
OR D2.group_id IS NULL
) ;
7/4/2010 10:35 PM | ms65g

#re: Proper Relational Division With Sets

Cannot run Ms65g 7 due to use of "*", Please rewrite and use proper column names and table alias prefix.
Here is Ms65g 8

-- Ms65g 8
Table 'Worktable'. Scan count 2, logical reads 15200.
Table 'Sample'. Scan count 9, logical reads 27184.
Table 'Sample2'. Scan count 2, logical reads 2572.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 14180 ms, elapsed time = 7803 ms.

7/4/2010 11:08 PM |

#re: Proper Relational Division With Sets

After some deduction and logical trial and errors, this is what v7 gives

-- Ms65g 7
Table 'Sample'. Scan count 15, logical reads 45336.
Table 'Worktable'. Scan count 3, logical reads 27920.
Table 'Sample2'. Scan count 3, logical reads 3858.
Table 'Worktable'. Scan count 0, logical reads 0.
SQL Server Execution Times:
CPU time = 33352 ms, elapsed time = 19222 ms.

7/4/2010 11:18 PM |

#re: Proper Relational Division With Sets

Why Peso 2 is very much faster than other queries?
IO statistics information give us scan numbers of "Worktable" for Peso 2 is 0!
Could you please run all of queries in a batch and show us the cost in percent (using MVP Adam Machanic sample data)?
And how many years needed to be a person expert in T-SQL querying like you? :)
I have one year experiences in SQL.

Thanks.

7/4/2010 11:58 PM | ms65g

#re: Proper Relational Division With Sets

What if when list is set?! No problem with dynamic SQL we can use your solution that introduced it in this post:
http://weblogs.sqlteam.com/peterl/archive/2010/06/30/Relational-division.aspx

====================================================
DECLARE @List VARCHAR(500) = '', @SQL VARCHAR(500) ;
SELECT @List += ',' + item_name
FROM Divisor ;

SELECT @list = STUFF(@list, 1, 1, '');
SELECT @List = REPLACE(@List, ',', CHAR(39)+','+CHAR(39)) ;

SELECT @SQL =
'SELECT group_id
FROM Dividend
GROUP BY group_id
HAVING MAX(CASE WHEN item_name IN ('''+@List+''') THEN 0 ELSE 1 END) = 0
AND SUM(CASE WHEN item_name IN ('''+@List+''') THEN 1 ELSE 0 END) = (SELECT COUNT(*) FROM Divisor);' ;

EXECUTE (@SQL) ;
=================================================
7/5/2010 1:02 AM | ms65g

#The Best Solution

Peso,
I think we have a winner!!
Now I introduce the best solution.

Look!

--========= ms65g 9 =================
SELECT d.group_id
FROM (SELECT group_id, MIN(item_name), MAX(item_name), COUNT(*)
FROM Dividend
GROUP BY group_id
)D(group_id, min_item, max_item, cnt)
INNER JOIN
(SELECT MIN(item_name), MAX(item_name), COUNT(*)
FROM Divisor)
dd(min_item, max_item,cnt)
ON d.min_item = dd.min_item
AND d.max_item=dd.max_item
AND d.cnt=dd.cnt;
--======================================
/*
Ms65g 9 (23%)
Table 'Dividend'. Scan count 1, logical reads 2
Table 'Divisor'. Scan count 1, logical reads 2

Peso 1 (26%)
Table 'Divisor'. Scan count 1, logical reads 19
Table 'Dividend'. Scan count 1, logical reads 2

Peso 2 (51%)
Table 'Divisor'. Scan count 1, logical reads 8
Table 'Dividend'. Scan count 2, logical reads 4
*/
7/8/2010 12:17 PM | ms65g

#re: Proper Relational Division With Sets

Replace "Three" with "Pink" in the divisor table and try again.
I will not test this solution.
7/8/2010 12:26 PM |

#re: Proper Relational Division With Sets

This will perform as good with much better accuracy. However, the CHECKSUM function is error prone (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832) and will give false positives, and maybe positive falses, due to the implementation of CHECKSUM.

SELECT group_id
FROM (
SELECT group_id,
CHECKSUM_AGG(CHECKSUM(item_name)) ca
FROM dbo.Dividend
GROUP BY group_id
) AS t
INNER JOIN (
SELECT CHECKSUM_AGG(CHECKSUM(item_name)) AS ca
FROM dbo.Divisor
) AS n ON n.ca = t.ca

7/8/2010 12:36 PM |

#re: Proper Relational Division With Sets

-- ms65g (fixed)
SELECT d.group_id
FROM (SELECT group_id, MIN(item_name), MAX(item_name), COUNT(*)
FROM Dividend
WHERE item_name IN (SELECT item_name FROM Divisor)
GROUP BY group_id
)D(group_id, min_item, max_item, cnt)
INNER JOIN
(SELECT MIN(item_name), MAX(item_name), COUNT(*)
FROM Divisor)
dd(min_item, max_item,cnt)
ON d.min_item = dd.min_item
AND d.max_item=dd.max_item
AND d.cnt=dd.cnt;
7/8/2010 12:50 PM | ms65g

#re: Proper Relational Division With Sets

Table 'Divisor'. Scan count 2, logical reads 20.
Table 'Dividend'. Scan count 1, logical reads 2.

Not better than the other previous suggestions.

Also, this #9 is only about division with remainder. Try the suggestion with the original sample data.
Your #9 will give group_id 1 as result.

Why? You are filtering the Dividend for the items in Divisor and for group_id 1 you will get three records left. So the count will match. And beause you filtered, the MIN and MAX item_name will be the same when count is the same.
7/8/2010 12:57 PM |

#re: Proper Relational Division With Sets

Oh, I got it.
Try it:

--Ms65g 9 (exact relational division)
SELECT d.group_id
FROM (SELECT group_id, MIN(item_name), MAX(item_name), COUNT(*)
FROM Dividend
GROUP BY group_id
)D(group_id, min_item, max_item, cnt)
INNER JOIN
(SELECT MIN(item_name), MAX(item_name), COUNT(*)
FROM Divisor)
dd(min_item, max_item,cnt)
ON d.min_item = dd.min_item
AND d.max_item=dd.max_item
AND d.cnt=dd.cnt

EXCEPT

SELECT group_id
FROM Dividend
WHERE item_name NOT IN (SELECT item_name FROM Divisor) ;
7/8/2010 1:14 PM | ms65g

#re: Proper Relational Division With Sets

-- Ms65g 9 (exact division)
Table 'Divisor'. Scan count 2, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.
7/8/2010 1:41 PM |

#re: Proper Relational Division With Sets

Following query in completely correct?

--ms65g 10
SELECT group_id
FROM (
SELECT group_id,
SUM(CAST(HASHBYTES('md5', item_name) AS BIGINT)) AS hashing
FROM dbo.Dividend
GROUP BY group_id
) AS t
INNER JOIN (
SELECT SUM(CAST(HASHBYTES('md5', item_name) AS BIGINT)) AS hashing
FROM dbo.Divisor
) AS n ON n.hashing = t.hashing
7/8/2010 1:48 PM | ms65g

#re: Proper Relational Division With Sets

Is the previous solution (Ms65g 10) correct in any statuses and cases?

7/8/2010 2:22 PM | ms65g

#I am winner!

------Concatenating Approach-------

Table 'Dividend'. Scan count 1, logical reads 2
Table 'Divisor'. Scan count 1, logical reads 2

--ms65g #11
SELECT DD.group_id
FROM (
SELECT group_id,
MAX(CASE WHEN ranking = 1 THEN item_name ELSE '' END) +
MAX(CASE WHEN ranking = 2 THEN ',' + item_name ELSE '' END) +
MAX(CASE WHEN ranking = 3 THEN ',' + item_name ELSE '' END) +
MAX(CASE WHEN ranking = 4 THEN ',' + item_name ELSE '' END) AS concatenating
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY item_name) AS Ranking
FROM Dividend
) AS D
GROUP BY group_id
) AS DD
JOIN
(
SELECT ',' + item_name
FROM Divisor
ORDER BY item_name ASC
FOR XML PATH('')
) AS D(concatenating)
ON DD.concatenating = STUFF(D.concatenating, 1, 1, '') ;

7/8/2010 8:03 PM | ms65g

#re: Proper Relational Division With Sets

#10 provides wrong result, as my suggestion with CHECKSUM_AGG.

Your #11 on the sample data provided by Adam (using only Keyid=3).
Table 'Sample2'. Scan count 3, logical reads 9100.
Table 'Sample'. Scan count 3, logical reads 9012.

When I extended your example for Keyid=1234, i killed the query after 6 minutes.
7/10/2010 10:04 AM |

#re: Proper Relational Division With Sets

ms65g new_1
;WITH D AS
(SELECT *, COUNT(*) OVER(PARTITION BY group_id) AS cnt,
MIN(item_name) OVER(PARTITION BY group_id) AS min_item,
MAX(item_name) OVER(PARTITION BY group_id) AS max_item
FROM Dividend)
SELECT group_id
FROM D AS D1
JOIN (SELECT item_name, COUNT(*) OVER() AS cnt,
MIN(item_name) OVER() AS min_item,
MAX(item_name) OVER() AS max_item
FROM Divisor) D2
ON D1.cnt = D2.cnt
AND D1.min_item = D2.min_item
AND D1.max_item = D2.max_item
AND D1.item_name = D2.item_name
GROUP BY group_id
HAVING COUNT(*) = MAX(D1.cnt) ;

ms65g new_2
;WITH D1(group_id, cnt, min_item, max_item) AS
(SELECT group_id, COUNT(*), MIN(item_name), MAX(item_name)
FROM Dividend
GROUP BY group_id),
D2(cnt, min_item, max_item) AS
(SELECT COUNT(*), MIN(item_name), MAX(item_name)
FROM Divisor)
SELECT group_id
FROM D1 JOIN D2
ON D1.cnt = D2.cnt
AND D1.min_item = D2.min_item
AND D1.max_item = D2.max_item
WHERE D1.cnt = (SELECT COUNT(*)
FROM Dividend D
JOIN Divisor D2
ON D.item_name = D2.item_name
WHERE D1.group_id = D.group_id)