Proper Relational Division With Sets

Fri Jul 2, 2010 by Peter Larsson in optimization, sql-server-2008, algorithms, sql-server-2005, sql-server-2000

I got an email from Mr Celko and he correctly stated that my previous solution was not truly working with sets, so he posted me some solutions and proper sample data.
With this information at my hand, I started to investigate what really is needed to get this Relational Division to work properly with two sets; Dividend and Divisor.

Some of you know me well, and know I am not satisfied with just solving the problem. There have to be some tweaks, and I did that too with this solution. Not only is it only touching the Dividend table once and Divisor table once, you can also set if you want a division with no remainder (which means all records in Divisor should match and not a single record more), or allow a division with remainder (which means all the records should match and maybe more records).
Great? Just set 1 for "No remainder" and 0 for "Allow remainder".

Simple as that. So why does it work? Remember your old algebra? "Divide is the same thing as multiply with the inverse number..."

Now for the sample data (courtesy of Mr Celko)

CREATE TABLE    dbo.Dividend
                (
                    group_id INTEGER NOT NULL,
                    item_name VARCHAR(10) NOT NULL,
                    PRIMARY KEY (
                                    group_id,
                                    item_name
                                )
                )

INSERT INTO    dbo.Dividend
                (
                    group_id,
                    item_name
                )
VALUES          (1, 'one'),
                (1, 'two'),
                (1, 'three'),
                (1, 'four'),
                (2, 'one'),
                (2, 'two'),
                (2, 'three'),
                (3, 'one'),
                (3, 'two')

CREATE TABLE    dbo.Divisor
                (
                    item_name VARCHAR(10) NOT NULL PRIMARY KEY
                )

INSERT INTO     dbo.Divisor
                (
                    item_name
                )
VALUES          ('one'),
                ('two'),
                ('three')

Now for the 4 solutions posted by Mr Celko

-- Celko 1
SELECT      D1.group_id
FROM        Dividend AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY    D1.group_id
HAVING      COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
            AND COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Dividend AS D2 WHERE D2.group_id = D1.group_id)

-- Celko 2
SELECT      D1.group_id
FROM        Dividend AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
            AND NOT EXISTS (
                                SELECT *
                                FROM    Dividend AS D2
                                WHERE   D2.group_id = D1.group_id
                                        AND D2.item_name NOT IN (SELECT item_name FROM Divisor)
                            )
GROUP BY    D1.group_id
HAVING      COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)

-- Celko 3
SELECT      D1.group_id
FROM        (
                SELECT group_id,
                        item_name,
                        COUNT(*) OVER (PARTITION BY group_id) AS cnt
                FROM    Dividend
            ) AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
            AND cnt = (SELECT COUNT(*) FROM Divisor)
GROUP BY    D1.group_id
HAVING      COUNT(D1.cnt) = (SELECT COUNT(*) FROM Divisor)

--Celko 4
;WITH Divisor2
AS (
        SELECT group_id,
                MIN(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS single,
                SUM(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS full_basket
        FROM    Dividend
)
SELECT      D.group_id
FROM        Dividend AS D,
            Divisor2
WHERE       D.group_id = Divisor2.group_id
            AND Divisor2.single = 1
            AND Divisor2.full_basket = (SELECT COUNT(*) FROM Divisor)
GROUP BY    D.group_id

You can copy and paste the code to a query window and run them. Investigate the execution plan and compare the 4 of them.
And now to my solution.

-- Peso 1
SELECT      group_id
FROM        (
                SELECT     t.group_id,
                            SUM(CASE WHEN t.item_name = n.item_name THEN 1 ELSE 0 END) AS cnt,
                            COUNT(*) AS Items
                FROM        dbo.Dividend AS t
                CROSS JOIN dbo.Divisor AS n
                GROUP BY    t.group_id,
                            t.item_name
            ) AS d
GROUP BY    group_id
HAVING      SUM(cnt) = MIN(Items)
            AND MIN(cnt) >= 1    -- 1 means no remainder, 0 means remainder

After some challenging with MVP Adam Machanic, here is another version

-- Peso v2
SELECT      t.group_id
FROM        (
                SELECT      group_id,
                            COUNT(*) AS cnt
                FROM        dbo.Dividend
                GROUP BY    group_id
            ) AS kc
INNER JOIN (
                SELECT COUNT(*) AS cnt
                FROM    dbo.Divisor
            ) AS nc ON nc.cnt = kc.cnt
INNER JOIN dbo.Dividend AS t ON t.group_id = kc.group_id
INNER JOIN dbo.Divisor AS n ON n.item_name = t.item_name
GROUP BY    t.group_id
HAVING      COUNT(*) = MIN(nc.cnt)

Here is an algorithm (exact division) which is really fast, but not 100% accurate due to the implementation of CHECKSUM (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832).

-- Peso 3
SELECT      group_id
FROM        (
                SELECT      group_id,
                            CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
                            CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
                FROM        dbo.Dividend
                GROUP BY    group_id
            ) AS t
INNER JOIN (
                SELECT CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
                        CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
                FROM    dbo.Divisor
            ) AS n ON n.ca1 = t.ca1
                AND n.ca2 = t.ca2

Now copy my solutions and compare them to the other 4.

-- Celko 1
Table 'Dividend'. Scan count 3, logical reads 6.
Table 'Divisor'. Scan count 2, logical reads 20.

-- Celko 2
Table 'Divisor'. Scan count 3, logical reads 32.
Table 'Dividend'. Scan count 4, logical reads 8.

-- Celko 3
Table 'Divisor'. Scan count 3, logical reads 10.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Celko 4
Table 'Dividend'. Scan count 2, logical reads 9.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Divisor'. Scan count 1, logical reads 38.

-- Peso 1
Table 'Divisor'. Scan count 1, logical reads 19.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Peso 2
Table 'Divisor'. Scan count 1, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.

-- Peso 3
Table 'Divisor'. Scan count 1, logical reads 2.
Table 'Dividend'. Scan count 1, logical reads 2.

So it seems my solution is cleaner and faster than the previous existing. But the best thing is yet hidden. My solution cares for multi-column division (just expand the CASE and GROUP BY clauses) whereas the previous 4 do not. Well, not easily anyway.
It will involve some replacing for IN with EXISTS, and some string concatenation for the DISTINCT clauses.

//Peso

Legacy Comments


Adam Machanic 2010-07-02	re: Proper Relational Division With Sets Hmm, might want to check your solution with a few more rows before you start the celebration... -- Prepare sample data CREATE TABLE Sample ( ParentID INT NOT NULL, Keyword VARCHAR(25) NOT NULL, PRIMARY KEY (Keyword, ParentId) ) -- Populate sample data INSERT Sample ( ParentID, Keyword ) select a.number, convert(varchar, b.number) from master..spt_values a inner join master..spt_values b on a.number >= b.number where a.type = 'p' and b.type = 'p' CREATE TABLE Sample2 ( KeyID INT NOT NULL, Keyword VARCHAR(25) NOT NULL, PRIMARY KEY (Keyword, KeyId) ) INSERT Sample2 select a.number, convert(varchar, b.number) from master..spt_values a inner join master..spt_values b on a.number >= b.number where a.type = 'p' and b.type = 'p' SELECT ParentId, keyid FROM ( SELECT t.ParentId, n.KeyId, SUM(CASE WHEN t.keyword = n.keyword THEN 1 ELSE 0 END) AS cnt, COUNT(*) AS Items FROM Sample AS t CROSS JOIN Sample2 AS n GROUP BY t.ParentId, n.keyid, t.keyword ) AS d GROUP BY parentid, keyid HAVING SUM(cnt) = MIN(Items) AND MIN(cnt) >= 1

Adam Machanic 2010-07-02	re: Proper Relational Division With Sets Here's a solution that returns the results in a decent amount of time given the above sample data (~56 seconds on my end, compared with killing yours after over 20 minutes): SELECT DISTINCT ParentId, KeyId FROM ( SELECT s.ParentId, s2.KeyId, sum(case when s2.keyid is null then 1 else 0 end) over (partition by s.parentid) as ksum, sum(case when s.parentid is null then 1 else 0 end) over (partition by s2.keyid) as psum FROM ( SELECT parentId, keyword, count() over (partition by parentId) as parentCount FROM sample ) as s full outer join ( SELECT keyId, keyword, count() over (partition by keyid) as keyCount FROM Sample2 ) as s2 on s2.keyword = s.keyword and s2.keyCount = s.parentCount ) as x WHERE ksum = 0 AND psum = 0

Peso 2010-07-02	re: Proper Relational Division With Sets What function does KeyID have?

Peso 2010-07-02	re: Proper Relational Division With Sets Oh, the KeyID is meant as a "SetID" for multiple sets of divisors? Ok, this code below runs in 10 seconds on my laptop, versus 105 seconds for your suggestion above with same sample data. SELECT t.ParentID, d.KeyID FROM ( SELECT ParentID, COUNT() AS cnt FROM dbo.Sample GROUP BY ParentID ) AS kc INNER JOIN ( SELECT KeyID, COUNT() AS cnt FROM dbo.Sample2 GROUP BY KeyID ) AS nc ON nc.cnt = kc.cnt INNER JOIN dbo.Sample AS t ON t.ParentID = kc.ParentID INNER JOIN dbo.Sample2 AS d ON d.Keyword = t.Keyword AND d.KeyID = nc.KeyID GROUP BY t.ParentID, d.KeyID HAVING COUNT(*) = MIN(nc.cnt)

Adam Machanic 2010-07-02	re: Proper Relational Division With Sets Looks good to me -- faster on this end, too.

Peso 2010-07-02	re: Proper Relational Division With Sets Great. If you want to allow divisions with remainder, just change "AS nc ON nc.cnt = kc.cnt" to "AS nc ON nc.cnt <= kc.cnt". This new version also allows for multi-column division. Thank you for pointing out the flaw in ny first solution when using megasize sample data. Now I think my version 2 deals acceptable with all sizes of sample data.

ms65g 2010-07-04	re: Exact Division Try. --ms65g 1 SELECT * FROM (SELECT DISTINCT group_id FROM Dividend) D WHERE NOT EXISTS (SELECT * FROM Divisor D1 FULL OUTER JOIN (SELECT item_name FROM Dividend WHERE group_id = D.group_id) D2 ON D1.item_name = D2.item_name WHERE D1.item_name IS NULL OR D2.item_name IS NULL) ; --ms65g 2 SELECT * FROM (SELECT DISTINCT group_id FROM Dividend) D WHERE NOT EXISTS ( SELECT * FROM ( SELECT item_name FROM Divisor UNION ALL SELECT DISTINCT item_name FROM Dividend WHERE group_id = D.group_id )d GROUP BY d.item_name HAVING COUNT(item_name) = 1 ) ; --ms65g 3 SELECT * FROM (SELECT DISTINCT group_id FROM Dividend) D WHERE NOT EXISTS ( SELECT item_name FROM Divisor EXCEPT SELECT item_name FROM Dividend WHERE group_id = D.group_id ) AND NOT EXISTS ( SELECT item_name FROM Dividend WHERE group_id = D.group_id EXCEPT SELECT item_name FROM Divisor ) ;

Peso 2010-07-04	re: Proper Relational Division With Sets New ideas are always welcome! Peso-2 (15% of batch) Table 'Dividend'. Scan count 2, logical reads 4. Table 'Divisor'. Scan count 1, logical reads 8. Msg65-1 (29% of batch) Table 'Dividend'. Scan count 4, logical reads 8. Table 'Divisor'. Scan count 1, logical reads 7. Msg65-2 (36% of batch) Table 'Dividend'. Scan count 4, logical reads 8. Table 'Divisor'. Scan count 1, logical reads 7. Msg65-3 (20% of batch) Table 'Dividend'. Scan count 3, logical reads 22. Table 'Divisor'. Scan count 2, logical reads 15.

Peso 2010-07-04	re: Proper Relational Division With Sets -- Peso 1 (4% of batch) Table 'Divisor'. Scan count 1, logical reads 19. Table 'Dividend'. Scan count 1, logical reads 2. -- Peso 2 (7% of batch) Table 'Divisor'. Scan count 1, logical reads 8. Table 'Dividend'. Scan count 2, logical reads 4. -- Celko 1 (7% of batch) Table 'Dividend'. Scan count 3, logical reads 6. Table 'Divisor'. Scan count 2, logical reads 20. -- Celko 2 (9% of batch) Table 'Divisor'. Scan count 3, logical reads 53. Table 'Dividend'. Scan count 9, logical reads 18. -- Celko 3 (7% of batch) Table 'Divisor'. Scan count 3, logical reads 10. Table 'Worktable'. Scan count 3, logical reads 34. Table 'Dividend'. Scan count 1, logical reads 2. -- Celko 4 (10% of batch) Table 'Dividend'. Scan count 2, logical reads 9. Table 'Worktable'. Scan count 3, logical reads 34. Table 'Divisor'. Scan count 1, logical reads 38. -- Msg65 1 (14% of batch) Table 'Dividend'. Scan count 4, logical reads 8. Table 'Divisor'. Scan count 1, logical reads 7. -- Msg65 2 (16% of batch) Table 'Dividend'. Scan count 4, logical reads 8. Table 'Divisor'. Scan count 1, logical reads 7. -- Msg65 3 (9% of batch) Table 'Divisor'. Scan count 2, logical reads 15. Table 'Dividend'. Scan count 3, logical reads 22. -- Adam (18% of batch) Table 'Worktable'. Scan count 9, logical reads 73. Table 'Dividend'. Scan count 1, logical reads 2. Table 'Divisor'. Scan count 1, logical reads 2.

ms65g 2010-07-04	re: Proper Relational Division With Sets Try it also. --Ms65g 4 SELECT group_id FROM ( SELECT group_id, COUNT(*) AS cnt FROM ( SELECT group_id, item_name FROM Dividend UNION ALL ( SELECT group_id, item_name FROM (SELECT DISTINCT group_id FROM Dividend) D CROSS JOIN Divisor ) ) D GROUP BY group_id, item_name ) D GROUP BY group_id HAVING MAX(CASE WHEN cnt = 1 THEN 1 ELSE 0 END) = 0 ;

ms65g 2010-07-04	re: Proper Relational Division With Sets --=== Ms65g 4 ====-- Table 'Divisor'. Scan count 1, logical reads 7 Table 'Dividend'. Scan count 2, logical reads 4 --=== Peso 2 ====-- Table 'Divisor'. Scan count 1, logical reads 8 Table 'Dividend'. Scan count 2, logical reads 4

ms65g 2010-07-04	re: Proper Relational Division With Sets Also we can improve the "Celko 2" solution with using noncorrelated version (relational algebra) like this: === Ms65g 5 === SELECT D1.group_id FROM Dividend AS D1 WHERE D1.item_name IN (SELECT item_name FROM Divisor) GROUP BY D1.group_id HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor) EXCEPT SELECT group_id FROM Dividend WHERE item_name NOT IN (SELECT item_name FROM Divisor) ;

ms65g 2010-07-04	re: Proper Relational Division With Sets === Ms65g 6 === SELECT D1.group_id FROM Dividend D1 LEFT OUTER JOIN Divisor D2 ON D1.item_name = D2.item_name GROUP BY D1.group_id HAVING COUNT(D1.item_name) = (SELECT COUNT(*) FROM Divisor) AND MAX(CASE WHEN D2.item_name IS NULL THEN 1 ELSE 0 END) = 0 ; =============

ms65g 2010-07-04	re: Proper Relational Division With Sets This solution is completely equivalents with "Ms65g 5" approach by it using correlated subquery instead of EXCEPT. ==== Ms65g 5 correlated_version ========= SELECT D1.group_id FROM Dividend AS D1 WHERE D1.item_name IN (SELECT item_name FROM Divisor) GROUP BY D1.group_id HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT() FROM Divisor) AND NOT EXISTS (SELECT FROM Dividend WHERE group_id = D1.group_id AND item_name NOT IN (SELECT item_name FROM Divisor)) ; =================================

Peso 2010-07-04	re: Proper Relational Division With Sets -- Ms65g 5 Table 'Divisor'. Scan count 3, logical reads 28. Table 'Dividend'. Scan count 3, logical reads 6. -- Ms65g 5 correlated Table 'Divisor'. Scan count 3, logical reads 28. Table 'Dividend'. Scan count 3, logical reads 6. -- Ms65g 6 Table 'Divisor'. Scan count 2, logical reads 21. Table 'Dividend'. Scan count 1, logical reads 2.

Peso 2010-07-04	re: Proper Relational Division With Sets I tested your seemingly best query performance wise against Peso 2 on the large sample set provided by MVP Adam Machanic. I deleted all records but KeyID 1234 from divisor before starting. Here are the results -- Ms65g 4 Table 'Sample'. Scan count 6, logical reads 18120, physical reads 0. Table 'Sample2'. Scan count 2, logical reads 2572, physical reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0. Table 'Worktable'. Scan count 2, logical reads 15200, physical reads 0. SQL Server Execution Times: CPU time = 14337 ms, elapsed time = 7692 ms. -- Peso 2 Table 'Sample'. Scan count 1238, logical reads 21802. Table 'Sample2'. Scan count 4, logical reads 2747. Table 'Worktable'. Scan count 0, logical reads 0. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 1389 ms, elapsed time = 718 ms.

ms65g 2010-07-04	re: Proper Relational Division With Sets Congratulations Peso! You are a professional expert in creating most efficient queries. Thanks for community. I like this topic.

Peso 2010-07-04	re: Proper Relational Division With Sets And for completeness, this are the timings for Celko's solutions on the large sample set for KeyID 1234 only. -- Celko 1 Table 'Sample2'. Scan count 4, logical reads 2747. Table 'Worktable'. Scan count 0, logical reads 0. Table 'Sample'. Scan count 1238, logical reads 21802. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 1997 ms, elapsed time = 1062 ms. -- Celko 2 Table 'Worktable'. Scan count 0, logical reads 0. Table 'Sample'. Scan count 1236, logical reads 21588. Table 'Sample2'. Scan count 3, logical reads 3858. SQL Server Execution Times: CPU time = 6053 ms, elapsed time = 7592 ms. -- Celko 3 Table 'Sample2'. Scan count 5, logical reads 4033. Table 'Sample'. Scan count 3, logical reads 9024. Table 'Worktable'. Scan count 6, logical reads 4243392. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 14929 ms, elapsed time = 8875 ms. -- Celko 4 Table 'Sample2'. Scan count 6, logical reads 4208. Table 'Sample'. Scan count 6, logical reads 18084. Table 'Worktable'. Scan count 6, logical reads 4235906. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 21217 ms, elapsed time = 12610 ms. -- Peso 2 Table 'Sample2'. Scan count 4, logical reads 2747. Table 'Worktable'. Scan count 0, logical reads 0. Table 'Sample'. Scan count 1238, logical reads 21802. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 1388 ms, elapsed time = 715 ms.

ms65g 2010-07-04	re: Proper Relational Division With Sets I am back, with a pure relational algebra approach. But I do not think this method be practical. ;WITH [Distinct] AS (SELECT DISTINCT group_id FROM Dividend ), [AB] AS (SELECT FROM [Distinct] CROSS JOIN Divisor ) SELECT group_id FROM [Distinct] EXCEPT SELECT group_id FROM ( (SELECT * FROM Dividend EXCEPT SELECT * FROM [AB]) UNION (SELECT FROM [AB] EXCEPT SELECT FROM Dividend) ) D ;

ms65g 2010-07-04	re: Proper Relational Division With Sets Wait…! I am back, with another pure relational algebra approach. Previous solution calls “Ms65g 7” and current solution call “Ms65g 8” --Ms65g 8 ;WITH [Distinct] AS ( SELECT DISTINCT group_id FROM Dividend ) SELECT group_id FROM [Distinct] EXCEPT ( SELECT COALESCE(D1.group_id, D2.group_id) FROM Dividend D1 FULL OUTER JOIN (SELECT * FROM [Distinct] CROSS JOIN Divisor ) D2 ON D1.group_id = D2.group_id AND D1.item_name = D2.item_name WHERE D1.group_id IS NULL OR D2.group_id IS NULL ) ;

Peso 2010-07-04	re: Proper Relational Division With Sets Cannot run Ms65g 7 due to use of "*", Please rewrite and use proper column names and table alias prefix. Here is Ms65g 8 -- Ms65g 8 Table 'Worktable'. Scan count 2, logical reads 15200. Table 'Sample'. Scan count 9, logical reads 27184. Table 'Sample2'. Scan count 2, logical reads 2572. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 14180 ms, elapsed time = 7803 ms.

Peso 2010-07-04	re: Proper Relational Division With Sets After some deduction and logical trial and errors, this is what v7 gives -- Ms65g 7 Table 'Sample'. Scan count 15, logical reads 45336. Table 'Worktable'. Scan count 3, logical reads 27920. Table 'Sample2'. Scan count 3, logical reads 3858. Table 'Worktable'. Scan count 0, logical reads 0. SQL Server Execution Times: CPU time = 33352 ms, elapsed time = 19222 ms.

ms65g 2010-07-04	re: Proper Relational Division With Sets Why Peso 2 is very much faster than other queries? IO statistics information give us scan numbers of "Worktable" for Peso 2 is 0! Could you please run all of queries in a batch and show us the cost in percent (using MVP Adam Machanic sample data)? And how many years needed to be a person expert in T-SQL querying like you? :) I have one year experiences in SQL. Thanks.

ms65g 2010-07-05	re: Proper Relational Division With Sets What if when list is set?! No problem with dynamic SQL we can use your solution that introduced it in this post: http://weblogs.sqlteam.com/peterl/archive/2010/06/30/Relational-division.aspx ==================================================== DECLARE @List VARCHAR(500) = '', @SQL VARCHAR(500) ; SELECT @List += ',' + item_name FROM Divisor ; SELECT @list = STUFF(@list, 1, 1, ''); SELECT @List = REPLACE(@List, ',', CHAR(39)+','+CHAR(39)) ; SELECT @SQL = 'SELECT group_id FROM Dividend GROUP BY group_id HAVING MAX(CASE WHEN item_name IN ('''+@List+''') THEN 0 ELSE 1 END) = 0 AND SUM(CASE WHEN item_name IN ('''+@List+''') THEN 1 ELSE 0 END) = (SELECT COUNT(*) FROM Divisor);' ; EXECUTE (@SQL) ; =================================================

ms65g 2010-07-08	The Best Solution Peso, I think we have a winner!! Now I introduce the best solution. Look! --========= ms65g 9 ================= SELECT d.group_id FROM (SELECT group_id, MIN(item_name), MAX(item_name), COUNT() FROM Dividend GROUP BY group_id )D(group_id, min_item, max_item, cnt) INNER JOIN (SELECT MIN(item_name), MAX(item_name), COUNT() FROM Divisor) dd(min_item, max_item,cnt) ON d.min_item = dd.min_item AND d.max_item=dd.max_item AND d.cnt=dd.cnt; --====================================== /* Ms65g 9 (23%) Table 'Dividend'. Scan count 1, logical reads 2 Table 'Divisor'. Scan count 1, logical reads 2 Peso 1 (26%) Table 'Divisor'. Scan count 1, logical reads 19 Table 'Dividend'. Scan count 1, logical reads 2 Peso 2 (51%) Table 'Divisor'. Scan count 1, logical reads 8 Table 'Dividend'. Scan count 2, logical reads 4 */

Peso 2010-07-08	re: Proper Relational Division With Sets Replace "Three" with "Pink" in the divisor table and try again. I will not test this solution.

Peso 2010-07-08	re: Proper Relational Division With Sets This will perform as good with much better accuracy. However, the CHECKSUM function is error prone (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832) and will give false positives, and maybe positive falses, due to the implementation of CHECKSUM. SELECT group_id FROM ( SELECT group_id, CHECKSUM_AGG(CHECKSUM(item_name)) ca FROM dbo.Dividend GROUP BY group_id ) AS t INNER JOIN ( SELECT CHECKSUM_AGG(CHECKSUM(item_name)) AS ca FROM dbo.Divisor ) AS n ON n.ca = t.ca

ms65g 2010-07-08	re: Proper Relational Division With Sets Sorry, I missed the WHERE clause. How about this? -- ms65g (fixed) SELECT d.group_id FROM (SELECT group_id, MIN(item_name), MAX(item_name), COUNT() FROM Dividend WHERE item_name IN (SELECT item_name FROM Divisor) GROUP BY group_id )D(group_id, min_item, max_item, cnt) INNER JOIN (SELECT MIN(item_name), MAX(item_name), COUNT() FROM Divisor) dd(min_item, max_item,cnt) ON d.min_item = dd.min_item AND d.max_item=dd.max_item AND d.cnt=dd.cnt;

Peso 2010-07-08	re: Proper Relational Division With Sets Table 'Divisor'. Scan count 2, logical reads 20. Table 'Dividend'. Scan count 1, logical reads 2. Not better than the other previous suggestions. Also, this #9 is only about division with remainder. Try the suggestion with the original sample data. Your #9 will give group_id 1 as result. Why? You are filtering the Dividend for the items in Divisor and for group_id 1 you will get three records left. So the count will match. And beause you filtered, the MIN and MAX item_name will be the same when count is the same.

ms65g 2010-07-08	re: Proper Relational Division With Sets Oh, I got it. Try it: --Ms65g 9 (exact relational division) SELECT d.group_id FROM (SELECT group_id, MIN(item_name), MAX(item_name), COUNT() FROM Dividend GROUP BY group_id )D(group_id, min_item, max_item, cnt) INNER JOIN (SELECT MIN(item_name), MAX(item_name), COUNT() FROM Divisor) dd(min_item, max_item,cnt) ON d.min_item = dd.min_item AND d.max_item=dd.max_item AND d.cnt=dd.cnt EXCEPT SELECT group_id FROM Dividend WHERE item_name NOT IN (SELECT item_name FROM Divisor) ;

Peso 2010-07-08	re: Proper Relational Division With Sets -- Ms65g 9 (exact division) Table 'Divisor'. Scan count 2, logical reads 8. Table 'Dividend'. Scan count 2, logical reads 4.

ms65g 2010-07-08	re: Proper Relational Division With Sets Following query in completely correct? --ms65g 10 SELECT group_id FROM ( SELECT group_id, SUM(CAST(HASHBYTES('md5', item_name) AS BIGINT)) AS hashing FROM dbo.Dividend GROUP BY group_id ) AS t INNER JOIN ( SELECT SUM(CAST(HASHBYTES('md5', item_name) AS BIGINT)) AS hashing FROM dbo.Divisor ) AS n ON n.hashing = t.hashing

ms65g 2010-07-08	re: Proper Relational Division With Sets Is the previous solution (Ms65g 10) correct in any statuses and cases?

ms65g 2010-07-08	I am winner! ------Concatenating Approach------- Table 'Dividend'. Scan count 1, logical reads 2 Table 'Divisor'. Scan count 1, logical reads 2 --ms65g #11 SELECT DD.group_id FROM ( SELECT group_id, MAX(CASE WHEN ranking = 1 THEN item_name ELSE '' END) + MAX(CASE WHEN ranking = 2 THEN ',' + item_name ELSE '' END) + MAX(CASE WHEN ranking = 3 THEN ',' + item_name ELSE '' END) + MAX(CASE WHEN ranking = 4 THEN ',' + item_name ELSE '' END) AS concatenating FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY item_name) AS Ranking FROM Dividend ) AS D GROUP BY group_id ) AS DD JOIN ( SELECT ',' + item_name FROM Divisor ORDER BY item_name ASC FOR XML PATH('') ) AS D(concatenating) ON DD.concatenating = STUFF(D.concatenating, 1, 1, '') ;

Peso 2010-07-10	re: Proper Relational Division With Sets #10 provides wrong result, as my suggestion with CHECKSUM_AGG. Your #11 on the sample data provided by Adam (using only Keyid=3). Table 'Sample2'. Scan count 3, logical reads 9100. Table 'Sample'. Scan count 3, logical reads 9012. When I extended your example for Keyid=1234, i killed the query after 6 minutes.

Mohammad 2010-07-12	re: Proper Relational Division With Sets ms65g new_1 ;WITH D AS (SELECT , COUNT() OVER(PARTITION BY group_id) AS cnt, MIN(item_name) OVER(PARTITION BY group_id) AS min_item, MAX(item_name) OVER(PARTITION BY group_id) AS max_item FROM Dividend) SELECT group_id FROM D AS D1 JOIN (SELECT item_name, COUNT() OVER() AS cnt, MIN(item_name) OVER() AS min_item, MAX(item_name) OVER() AS max_item FROM Divisor) D2 ON D1.cnt = D2.cnt AND D1.min_item = D2.min_item AND D1.max_item = D2.max_item AND D1.item_name = D2.item_name GROUP BY group_id HAVING COUNT() = MAX(D1.cnt) ; ms65g new_2 ;WITH D1(group_id, cnt, min_item, max_item) AS (SELECT group_id, COUNT(), MIN(item_name), MAX(item_name) FROM Dividend GROUP BY group_id), D2(cnt, min_item, max_item) AS (SELECT COUNT(), MIN(item_name), MAX(item_name) FROM Divisor) SELECT group_id FROM D1 JOIN D2 ON D1.cnt = D2.cnt AND D1.min_item = D2.min_item AND D1.max_item = D2.max_item WHERE D1.cnt = (SELECT COUNT(*) FROM Dividend D JOIN Divisor D2 ON D.item_name = D2.item_name WHERE D1.group_id = D.group_id)