Mladen Prajdić Blog

Blog about stuff and things and stuff. Mostly about SQL server and .Net

The simplest way to delete duplicates and compare two result sets in SQL Server

Comparing result sets

There are times, although not often that you have to compare 2 result sets. This usually happens when you're analyzing data for whatever reason or unit testing a database.

Usual methods include

- using the UNION of both queries because it filters duplicate data and checking if the row counts are the same

- using the EXCEPT if you have SQL Server 2005+

- using NOT EXISTS which is very ugly and not worth it

- using a FULL OUTER JOIN and looking at nulls or some other method I haven't mentioned

- returning result sets to client and comparing them there

- some other method I haven't mentioned

 

But for a while now I've been using this method using BINARY_CHECKSUM, CHECKSUM and CHECKSUM_AGG functions, since they are available in all versions of SQL Server. I prefer BINARY_CHECKSUM over CHECKSUM because BINARY_CHECKSUM is case sensitive. In a case-insensitive database CHECKSUM returns the same value for 'abcd' and 'ABCD' while BINARY_CHECKSUM does not.

Here's the code I use for result set comparison:

 

 

Deleting duplicates

Deleting duplicates becomes an easy task with this method both in SQL Server 2000 and 2005+:

It's fast, simple, clean, understandable and works like a charm. If you have a better way do let me know.

 

kick it on DotNetKicks.com
 

Legacy Comments


Peso
2009-01-05
re: The simplest way to delete duplicates and compare two result sets in SQL Server
Add this sample data

INSERT INTO #t1(ID, VALUE) VALUES (1,'bQ')

an try again.

Then read this post http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832
and the prelude why checksum is NOT reliable http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70328

Peso
2009-01-05
re: The simplest way to delete duplicates and compare two result sets in SQL Server
DELETE f
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, ... ORDER BY Col1) AS recID
) AS f
WHERE recID > 2

Mladen
2009-01-05
re: The simplest way to delete duplicates and compare two result sets in SQL Server
awsome!
i did not know that.
thanx Peter.

Peso
2009-01-05
re: The simplest way to delete duplicates and compare two result sets in SQL Server
The Microsoft CHECKSUM is a very, VERY, weak algorithm as proved in the links above.
It just a 32-bit value XOR'ed with previous value shifted 4 bits.

I strongly, but kindly, vote against this method of using CHECKSUM for comparison.

Zehra Nasif
2009-01-05
re: The simplest way to delete duplicates
This will delete all records from the table MyTable which have the same value for the field dupField, leaving that record which has the lowest value in uniqueField.

delete from T1 from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and
T1.uniqueField > T2.uniqueField


Toni
2009-01-15
re: The simplest way to delete duplicates and compare two result sets in SQL Server
In doing any of the comparisons (if that is the way you must go e.g pre-SQL 2005) look out for the case where a field might contain Null. In that case, your duplicate_field_1 = duplicate_field_2 would not find the record.

DBA
2009-01-23
re: The simplest way to delete duplicates and compare two result sets in SQL Server
Hello.
There is another generic way to compare resultsets. I will try to show an example


select a1, a2, a3, a4, sum(v)
from (
select a1, a2, a3, a4, v = 1
from table1
union
select a1, a2, a3, a4, v = 2
from table2
) t
group by a1, a2, a3, a4,
having sum(v) in (1,2)