I thought I'd post a blog showing a technique I haven't seen elsewhere. This is from a recent post of mine at the SqlTeam forums.
The problem: you have two tables, each containing mutiple rows of data per “SetID”. You need to compare the two tables to see which sets have complete matches.
First, the DDL and the data:
create table #t1
(SetID int not null,
Attribute varchar(10) not null,
Value int not null,
constraint t1_pk primary key (SetID, Attribute))
create table #t2
(SetID int not null,
Attribute varchar(10) not null,
Value int not null,
constraint t2_pk primary key (SetID, Attribute))
GO
insert into #t1
select 1,'a',1 union
select 1,'b',2 union
select 1,'c',3 union
select 2,'a',4 union
select 2,'c',5 union
select 2,'d',6 union
select 3,'a',9 union
select 3,'b',10 union
select 4,'z',2 union
select 5,'a',4 union
select 5,'b',2
insert into #t2
select 100,'a',1 union
select 100,'b',2 union
select 100,'c',3 union
select 200,'a',4 union
select 200,'c',5 union
select 200,'d',6 union
select 300,'a',1 union
select 300,'b',2 union
select 400,'c',5 union
select 400,'d',6
So, we want to return find out which sets of rows from table1 match a set of rows in table2. There must be no missing or extra rows in either, and of course all attribute/value pairs must match.
With this data, note that SetID 1 from #t1 should match SetID 100 from #t2, and setID 2 from #t1 should match setID 200 from #t2.
To begin, we will start with a cross join of all the data in #t1, with each possibile matching SetID from #t2:
select
t1.SetId as t1ID, t2.SetId as t2ID, t1.attribute, t1.value
from
#t1 t1
cross join
(select distinct SetId from #t2) t2
We will UNION that with all data in #t2, with each possible matching SetID from #t1 (basically, the opposite of the first one)
union all
select
t1.SetId as t1ID, t2.SetId as t2ID, t2.attribute, t2.value
from
#t2 t2
cross join
(select distinct SetId from #t1) t1
Look at those results. Doesn't look like much, but now we can GROUP BY t1ID, t2ID, attribute and Value and return the count of each:
select
t1ID, t2ID, attribute, value, count(*) as Matches
from
(above SQL with the union all) a
group by
t1ID, t2ID, attribute, value
Look at those results as well. Note that because of the GROUP BY, we know if "Matches" = 2 we have a match between #t1 and #t2 for that attribute/value pair.
So, we need to add one more layer to the above query, and ensure that we return only t1ID and t2ID's where ALL matches = 2. By definition, if the MIN(Matches) for each t1ID and t2ID = 2, then there must be none less than 2:
select
t1ID, t2ID
from
(the previous SQL) a
group by
t1ID, t2ID
having
min(Matches) =2
and voila ! that's our result. Only EXACT matches for sets of data between #t1 and #t2.
Any feedback ? is there a better / easier way to do this? let me know! (should this have been an article? is it too long for a blog?)
- Jeff