Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

Comparing sets of rows from two tables

I thought I'd post a blog showing a technique I haven't seen elsewhere.  This is from a recent post of mine at the SqlTeam forums.

The problem: you have two tables, each containing mutiple rows of data per “SetID”.  You need to compare the two tables to see which sets have complete matches.

First, the DDL and the data:

create table #t1
 (SetID int not null,
 Attribute varchar(10) not null,
 Value int not null,
constraint t1_pk primary key (SetID, Attribute))

create table #t2
 (SetID int not null,
 Attribute varchar(10) not null,
 Value int not null,
constraint t2_pk primary key (SetID, Attribute))

GO

insert into #t1
select 1,'a',1 union
select 1,'b',2 union
select 1,'c',3 union


select 2,'a',4 union
select 2,'c',5 union
select 2,'d',6 union

select 3,'a',9 union
select 3,'b',10 union

select 4,'z',2 union

select 5,'a',4 union
select 5,'b',2

insert into #t2
select 100,'a',1 union
select 100,'b',2 union
select 100,'c',3 union

select 200,'a',4 union
select 200,'c',5 union
select 200,'d',6 union

select 300,'a',1 union
select 300,'b',2 union

select 400,'c',5 union
select 400,'d',6

So, we want to return find out which sets of rows from table1 match a set of rows in table2.  There must be no missing or extra rows in either, and of course all attribute/value pairs must match.

With this data, note that SetID 1 from #t1 should match SetID 100 from #t2, and setID 2 from #t1 should match setID 200 from #t2.

To begin, we will start with a cross join of all the data in #t1, with each possibile matching SetID from #t2:

select
  t1.SetId as t1ID, t2.SetId as t2ID, t1.attribute, t1.value
from
 #t1 t1
cross join 
 (select distinct SetId from #t2) t2


We will UNION that with all data in #t2, with each possible matching SetID from #t1 (basically, the opposite of the first one)

union all

select
  t1.SetId as t1ID, t2.SetId as t2ID, t2.attribute, t2.value
from
 #t2 t2
cross join 
 (select distinct SetId from #t1) t1

Look at those results.  Doesn't look like much, but now we can GROUP BY t1ID, t2ID, attribute and Value and return the count of each:

select
  t1ID, t2ID, attribute, value, count(*) as Matches
from
  (above SQL with the union all) a
group by
  t1ID, t2ID, attribute, value
 

Look at those results as well.  Note that because of the GROUP BY, we know if "Matches" = 2 we have a match between #t1 and #t2 for that attribute/value pair.

So, we need to add one more layer to the above query, and ensure that we return only t1ID and t2ID's where ALL matches = 2.  By definition, if the MIN(Matches) for each t1ID and t2ID = 2, then there must be none less than 2:

select
  t1ID, t2ID
from
  (the previous SQL) a
group by
 
t1ID, t2ID
having
 
min(Matches) =2

and voila !  that's our result.  Only EXACT matches for sets of data between #t1 and #t2.

Any feedback ? is there a better / easier way to do this?  let me know!  (should this have been an article? is it too long for a blog?)

- Jeff

Legacy Comments


DavidM
2003-10-06
re: Comparing sets of rows from two tables
Jeff,

Isn't this just relational division?

Jeff
2003-10-07
re: Comparing sets of rows from two tables
Sure ... so is there any easier way to return the results I'm returning? I checked Celko's article out and all he does is return 1 side of the equation (i.e., with my data, he would return "SetID's 1 and 2 from #t1 have complete matches".

Mine also returns WHO they match. and if they match more than one it tells you that as well. and it is easily adaptable to return only rows that DON'T match per each combination of sets, etc.

Definitely hook me up with other methods if you have some! (and try out my method, if you haven't already)
Thanks
Jeff


DavidM
2003-10-08
re: Comparing sets of rows from two tables
Celko's method can be modified...
<p>
Select x.setid,y.setid
from #t1 x
INNER JOIN #t2 y on x.attribute = y.attribute and x.value = y.value
group by x.setid, y.setid
having count(*) = (Select COUNT(*) from #t1 where setid = x.setid)

Jeff
2003-11-17
re: Comparing sets of rows from two tables
oooohh... good point as usual, David. When I wrote this technique I was thinking of returning "pct matches" and ensuring all combinations from both tables are returned, and things of that nature, which may be why I took a sort of "round about" approach.

Vyshak
2006-03-01
re: Comparing sets of rows from two tables
I wan a procedure by which the matching colums in each row listed out with their count...

Lino Barreca
2006-07-21
Shorter version
WITH CrossJoin AS
(
SELECT t2.ID Cod1, t1.ID Cod0, t1.attr, t1.value
FROM #t1 t1 CROSS JOIN #t2 t2
UNION ALL
SELECT t1.ID Cod0, t2.ID Cod1, t2.attr, t2.value
FROM #t1 t1 CROSS JOIN #t2 t2
)

SELECT Cod1, Cod0, attr, value
FROM CrossJoin
GROUP BY Cod1, Cod0, attr, value
HAVING COUNT(*)>1

Lino Barreca
2006-07-21
Here's a shorter version
WITH CrossJoin AS
(
SELECT t2.ID Cod1, t1.ID Cod0, t1.attr, t1.value
FROM #t1 t1 CROSS JOIN #t2 t2
UNION ALL
SELECT t1.ID Cod0, t2.ID Cod1, t2.attr, t2.value
FROM #t1 t1 CROSS JOIN #t2 t2
)

SELECT Cod1, Cod0, attr, value
FROM CrossJoin
GROUP BY Cod1, Cod0, attr, value
HAVING COUNT(*)>1

Lino Barreca
2006-07-21
re: Comparing sets of rows from two tables
Why can't I see comments here?

Radoslav Atanasov
2006-07-31
re: Comparing sets of rows from two tables
Cool! At least works :) I still can't figure out a better/easier way :) In fact I even couldn't figure out this one, until I read this :-D.
IMHO it's a clever thing!

Radoslav Atanasov
2006-07-31
re: Comparing sets of rows from two tables
I found this to be an easier solution, in terms of simplicity:

select t1.SetId, t2.SetId
from t1 cross join t2
where t1.attribute = t2.attribute
and t1.value = t2.value
group by t1.setid, t2.setid
having count(*) = (select count(*) from t1 as t where setid = t1.setid)
and count(*) = (select count(*) from t2 as t where setid = t2.setid)

But it's worse in terms of performance, as it makes cross join on the whole tables, and not just the setId-s...

asit
2007-06-05
re: Comparing sets of rows from two tables
There are two tables A and B and I am comparing the two tables. I want to Insert the records in Table B which are available in Table A but not in B and want to delete the records from table B which are not available in Table A. How do I proceed? I am clueless here.

I will apreciate your help.

yogesh
2008-08-21
re: Comparing sets of rows from two tables
Insert into Table B (Select * from Table A where Col_A not in (Select Col_A from Table B))


This will Help U. if Both Table have Same No of Columns and Column Names.


eRic
2008-10-16
re: Comparing sets of rows from two tables
Ok, so what if you have two tables: #T1 and #T2, with #T2 being an update of #T1. Both tables have columns 'ID' (PK), 'VAL' and 'TYPE'. How can you create a query that will count the unique instances of 'VAL' for each table, grouped by 'TYPE'.

In other words:
TYPE - T1_CNT - T2_CNT
A - 1 - 2
B - 3 - 2
C - 1 - 1

Peso
2010-07-02
re: Comparing sets of rows from two tables
Also see http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx