Comparing sets of rows from two tables
I thought I'd post a blog showing a technique I haven't seen elsewhere. This is from a recent post of mine at the SqlTeam forums.
The problem: you have two tables, each containing mutiple rows of data per “SetID”. You need to compare the two tables to see which sets have complete matches.
First, the DDL and the data:
create table #t1
(SetID int not null,
Attribute varchar(10) not null,
Value int not null,
constraint t1_pk primary key (SetID, Attribute))
create table #t2
(SetID int not null,
Attribute varchar(10) not null,
Value int not null,
constraint t2_pk primary key (SetID, Attribute))
GO
insert into #t1
select 1,'a',1 union
select 1,'b',2 union
select 1,'c',3 union
select 2,'a',4 union
select 2,'c',5 union
select 2,'d',6 union
select 3,'a',9 union
select 3,'b',10 union
select 4,'z',2 union
select 5,'a',4 union
select 5,'b',2
insert into #t2
select 100,'a',1 union
select 100,'b',2 union
select 100,'c',3 union
select 200,'a',4 union
select 200,'c',5 union
select 200,'d',6 union
select 300,'a',1 union
select 300,'b',2 union
select 400,'c',5 union
select 400,'d',6
So, we want to return find out which sets of rows from table1 match a set of rows in table2. There must be no missing or extra rows in either, and of course all attribute/value pairs must match.
With this data, note that SetID 1 from #t1 should match SetID 100 from #t2, and setID 2 from #t1 should match setID 200 from #t2.
To begin, we will start with a cross join of all the data in #t1, with each possibile matching SetID from #t2:
t1.SetId as t1ID, t2.SetId as t2ID, t1.attribute, t1.value
from
#t1 t1
cross join
(select distinct SetId from #t2) t2
We will UNION that with all data in #t2, with each possible matching SetID from #t1 (basically, the opposite of the first one)
union all
select
t1.SetId as t1ID, t2.SetId as t2ID, t2.attribute, t2.value
from
#t2 t2
cross join
(select distinct SetId from #t1) t1
Look at those results. Doesn't look like much, but now we can GROUP BY t1ID, t2ID, attribute and Value and return the count of each:
select
t1ID, t2ID, attribute, value, count(*) as Matches
from
(above SQL with the union all) a
group by
t1ID, t2ID, attribute, value
Look at those results as well. Note that because of the GROUP BY, we know if "Matches" = 2 we have a match between #t1 and #t2 for that attribute/value pair.
So, we need to add one more layer to the above query, and ensure that we return only t1ID and t2ID's where ALL matches = 2. By definition, if the MIN(Matches) for each t1ID and t2ID = 2, then there must be none less than 2:
select
t1ID, t2ID
from
(the previous SQL) a
group by
t1ID, t2ID
having
min(Matches) =2
and voila ! that's our result. Only EXACT matches for sets of data between #t1 and #t2.
Any feedback ? is there a better / easier way to do this? let me know! (should this have been an article? is it too long for a blog?)
- Jeff
Legacy Comments
DavidM
2003-10-06 |
re: Comparing sets of rows from two tables Jeff, Isn't this just relational division? |
Jeff
2003-10-07 |
re: Comparing sets of rows from two tables Sure ... so is there any easier way to return the results I'm returning? I checked Celko's article out and all he does is return 1 side of the equation (i.e., with my data, he would return "SetID's 1 and 2 from #t1 have complete matches". Mine also returns WHO they match. and if they match more than one it tells you that as well. and it is easily adaptable to return only rows that DON'T match per each combination of sets, etc. Definitely hook me up with other methods if you have some! (and try out my method, if you haven't already) Thanks Jeff |
DavidM
2003-10-08 |
re: Comparing sets of rows from two tables Celko's method can be modified... <p> Select x.setid,y.setid from #t1 x INNER JOIN #t2 y on x.attribute = y.attribute and x.value = y.value group by x.setid, y.setid having count(*) = (Select COUNT(*) from #t1 where setid = x.setid) |
Jeff
2003-11-17 |
re: Comparing sets of rows from two tables oooohh... good point as usual, David. When I wrote this technique I was thinking of returning "pct matches" and ensuring all combinations from both tables are returned, and things of that nature, which may be why I took a sort of "round about" approach. |
Vyshak
2006-03-01 |
re: Comparing sets of rows from two tables I wan a procedure by which the matching colums in each row listed out with their count... |
Lino Barreca
2006-07-21 |
Shorter version WITH CrossJoin AS ( SELECT t2.ID Cod1, t1.ID Cod0, t1.attr, t1.value FROM #t1 t1 CROSS JOIN #t2 t2 UNION ALL SELECT t1.ID Cod0, t2.ID Cod1, t2.attr, t2.value FROM #t1 t1 CROSS JOIN #t2 t2 ) SELECT Cod1, Cod0, attr, value FROM CrossJoin GROUP BY Cod1, Cod0, attr, value HAVING COUNT(*)>1 |
Lino Barreca
2006-07-21 |
Here's a shorter version WITH CrossJoin AS ( SELECT t2.ID Cod1, t1.ID Cod0, t1.attr, t1.value FROM #t1 t1 CROSS JOIN #t2 t2 UNION ALL SELECT t1.ID Cod0, t2.ID Cod1, t2.attr, t2.value FROM #t1 t1 CROSS JOIN #t2 t2 ) SELECT Cod1, Cod0, attr, value FROM CrossJoin GROUP BY Cod1, Cod0, attr, value HAVING COUNT(*)>1 |
Lino Barreca
2006-07-21 |
re: Comparing sets of rows from two tables Why can't I see comments here? |
Radoslav Atanasov
2006-07-31 |
re: Comparing sets of rows from two tables Cool! At least works :) I still can't figure out a better/easier way :) In fact I even couldn't figure out this one, until I read this :-D. IMHO it's a clever thing! |
Radoslav Atanasov
2006-07-31 |
re: Comparing sets of rows from two tables I found this to be an easier solution, in terms of simplicity: select t1.SetId, t2.SetId from t1 cross join t2 where t1.attribute = t2.attribute and t1.value = t2.value group by t1.setid, t2.setid having count(*) = (select count(*) from t1 as t where setid = t1.setid) and count(*) = (select count(*) from t2 as t where setid = t2.setid) But it's worse in terms of performance, as it makes cross join on the whole tables, and not just the setId-s... |
asit
2007-06-05 |
re: Comparing sets of rows from two tables There are two tables A and B and I am comparing the two tables. I want to Insert the records in Table B which are available in Table A but not in B and want to delete the records from table B which are not available in Table A. How do I proceed? I am clueless here. I will apreciate your help. |
yogesh
2008-08-21 |
re: Comparing sets of rows from two tables Insert into Table B (Select * from Table A where Col_A not in (Select Col_A from Table B)) This will Help U. if Both Table have Same No of Columns and Column Names. |
eRic
2008-10-16 |
re: Comparing sets of rows from two tables Ok, so what if you have two tables: #T1 and #T2, with #T2 being an update of #T1. Both tables have columns 'ID' (PK), 'VAL' and 'TYPE'. How can you create a query that will count the unique instances of 'VAL' for each table, grouped by 'TYPE'. In other words: TYPE - T1_CNT - T2_CNT A - 1 - 2 B - 3 - 2 C - 1 - 1 |
Peso
2010-07-02 |
re: Comparing sets of rows from two tables Also see http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx |
snow boots for women
2010-10-06 |
re: Comparing sets of rows from two tables You need to compare the two tables to see which sets have complete matches. |