## Comparing sets of rows from two tables

I thought I'd post a blog showing a technique I haven't seen elsewhere. This is from a recent post of mine at the SqlTeam forums.

The problem: you have two tables, each containing mutiple rows of data per “SetID”. You need to compare the two tables to see which sets have complete matches.

First, the DDL and the data:

create table #t1

(SetID int not null,

Attribute varchar(10) not null,

Value int not null,

constraint t1_pk primary key (SetID, Attribute))

create table #t2

(SetID int not null,

Attribute varchar(10) not null,

Value int not null,

constraint t2_pk primary key (SetID, Attribute))

GO

insert into #t1

select 1,'a',1 union

select 1,'b',2 union

select 1,'c',3 union

select 2,'a',4 union

select 2,'c',5 union

select 2,'d',6 union

select 3,'a',9 union

select 3,'b',10 union

select 4,'z',2 union

select 5,'a',4 union

select 5,'b',2

insert into #t2

select 100,'a',1 union

select 100,'b',2 union

select 100,'c',3 union

select 200,'a',4 union

select 200,'c',5 union

select 200,'d',6 union

select 300,'a',1 union

select 300,'b',2 union

select 400,'c',5 union

select 400,'d',6

So, we want to return find out which sets of rows from table1 match a set of rows in table2. There must be no missing or extra rows in either, and of course all attribute/value pairs must match.

With this data, note that SetID 1 from #t1 should match SetID 100 from #t2, and setID 2 from #t1 should match setID 200 from #t2.

To begin, we will start with a cross join of all the data in #t1, with each possibile matching SetID from #t2:

t1.SetId as t1ID, t2.SetId as t2ID, t1.attribute, t1.value

from

#t1 t1

cross join

(select distinct SetId from #t2) t2

We will UNION that with all data in #t2, with each possible matching SetID from #t1 (basically, the opposite of the first one)

union all

select

t1.SetId as t1ID, t2.SetId as t2ID, t2.attribute, t2.value

from

#t2 t2

cross join

(select distinct SetId from #t1) t1

Look at those results. Doesn't look like much, but now we can GROUP BY t1ID, t2ID, attribute and Value and return the count of each:

select

t1ID, t2ID, attribute, value, count(*) as Matches

from

(above SQL with the union all) a

group by

t1ID, t2ID, attribute, value

Look at those results as well. Note that because of the GROUP BY, we know if "Matches" = 2 we have a match between #t1 and #t2 for that attribute/value pair.

So, we need to add one more layer to the above query, and ensure that we return only t1ID and t2ID's where ALL matches = 2. By definition, if the MIN(Matches) for each t1ID and t2ID = 2, then there must be none less than 2:

select

t1ID, t2ID

from

(the previous SQL) a

group by

t1ID, t2ID

having

min(Matches) =2

and voila ! that's our result. Only EXACT matches for sets of data between #t1 and #t2.

Any feedback ? is there a better / easier way to do this? let me know! (should this have been an article? is it too long for a blog?)

- Jeff

Print | posted on Friday, October 03, 2003 1:59 PM | Filed Under [ T-SQL Joins/Relations ]

## Feedback

## # re: Comparing sets of rows from two tables

Jeff,Isn't this just relational division?

## # re: Comparing sets of rows from two tables

Sure ... so is there any easier way to return the results I'm returning? I checked Celko's article out and all he does is return 1 side of the equation (i.e., with my data, he would return "SetID's 1 and 2 from #t1 have complete matches".Mine also returns WHO they match. and if they match more than one it tells you that as well. and it is easily adaptable to return only rows that DON'T match per each combination of sets, etc.

Definitely hook me up with other methods if you have some! (and try out my method, if you haven't already)

Thanks

Jeff

## # re: Comparing sets of rows from two tables

Celko's method can be modified...<p>

Select x.setid,y.setid

from #t1 x

INNER JOIN #t2 y on x.attribute = y.attribute and x.value = y.value

group by x.setid, y.setid

having count(*) = (Select COUNT(*) from #t1 where setid = x.setid)

## # re: Comparing sets of rows from two tables

oooohh... good point as usual, David. When I wrote this technique I was thinking of returning "pct matches" and ensuring all combinations from both tables are returned, and things of that nature, which may be why I took a sort of "round about" approach.## # re: Comparing sets of rows from two tables

I wan a procedure by which the matching colums in each row listed out with their count...## # Shorter version

WITH CrossJoin AS(

SELECT t2.ID Cod1, t1.ID Cod0, t1.attr, t1.value

FROM #t1 t1 CROSS JOIN #t2 t2

UNION ALL

SELECT t1.ID Cod0, t2.ID Cod1, t2.attr, t2.value

FROM #t1 t1 CROSS JOIN #t2 t2

)

SELECT Cod1, Cod0, attr, value

FROM CrossJoin

GROUP BY Cod1, Cod0, attr, value

HAVING COUNT(*)>1

## # Here's a shorter version

WITH CrossJoin AS(

SELECT t2.ID Cod1, t1.ID Cod0, t1.attr, t1.value

FROM #t1 t1 CROSS JOIN #t2 t2

UNION ALL

SELECT t1.ID Cod0, t2.ID Cod1, t2.attr, t2.value

FROM #t1 t1 CROSS JOIN #t2 t2

)

SELECT Cod1, Cod0, attr, value

FROM CrossJoin

GROUP BY Cod1, Cod0, attr, value

HAVING COUNT(*)>1

## # re: Comparing sets of rows from two tables

Why can't I see comments here?## # re: Comparing sets of rows from two tables

Cool! At least works :) I still can't figure out a better/easier way :) In fact I even couldn't figure out this one, until I read this :-D.IMHO it's a clever thing!

## # re: Comparing sets of rows from two tables

I found this to be an easier solution, in terms of simplicity:select t1.SetId, t2.SetId

from t1 cross join t2

where t1.attribute = t2.attribute

and t1.value = t2.value

group by t1.setid, t2.setid

having count(*) = (select count(*) from t1 as t where setid = t1.setid)

and count(*) = (select count(*) from t2 as t where setid = t2.setid)

But it's worse in terms of performance, as it makes cross join on the whole tables, and not just the setId-s...

## # re: Comparing sets of rows from two tables

There are two tables A and B and I am comparing the two tables. I want to Insert the records in Table B which are available in Table A but not in B and want to delete the records from table B which are not available in Table A. How do I proceed? I am clueless here.I will apreciate your help.

## # re: Comparing sets of rows from two tables

Insert into Table B (Select * from Table A where Col_A not in (Select Col_A from Table B))This will Help U. if Both Table have Same No of Columns and Column Names.

## # re: Comparing sets of rows from two tables

Ok, so what if you have two tables: #T1 and #T2, with #T2 being an update of #T1. Both tables have columns 'ID' (PK), 'VAL' and 'TYPE'. How can you create a query that will count the unique instances of 'VAL' for each table, grouped by 'TYPE'.In other words:

TYPE - T1_CNT - T2_CNT

A - 1 - 2

B - 3 - 2

C - 1 - 1

## # re: Comparing sets of rows from two tables

Also see http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx## # re: Comparing sets of rows from two tables

You need to compare the two tables to see which sets have complete matches.