The simplest way to delete duplicates and compare two result sets in SQL Server

Mon Jan 5, 2009 by Mladen Prajdić in sql-server

Comparing result sets

There are times, although not often that you have to compare 2 result sets. This usually happens when you're analyzing data for whatever reason or unit testing a database.

Usual methods include

- using the UNION of both queries because it filters duplicate data and checking if the row counts are the same

- using the EXCEPT if you have SQL Server 2005+

- using NOT EXISTS which is very ugly and not worth it

- using a FULL OUTER JOIN and looking at nulls or some other method I haven't mentioned

- returning result sets to client and comparing them there

- some other method I haven't mentioned

But for a while now I've been using this method using BINARY_CHECKSUM, CHECKSUM and CHECKSUM_AGG functions, since they are available in all versions of SQL Server. I prefer BINARY_CHECKSUM over CHECKSUM because BINARY_CHECKSUM is case sensitive. In a case-insensitive database CHECKSUM returns the same value for 'abcd' and 'ABCD' while BINARY_CHECKSUM does not.

Here's the code I use for result set comparison:

Deleting duplicates

Deleting duplicates becomes an easy task with this method both in SQL Server 2000 and 2005+:

It's fast, simple, clean, understandable and works like a charm. If you have a better way do let me know.

Legacy Comments


Peso 2009-01-05	re: The simplest way to delete duplicates and compare two result sets in SQL Server Add this sample data INSERT INTO #t1(ID, VALUE) VALUES (1,'bQ') an try again. Then read this post http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832 and the prelude why checksum is NOT reliable http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70328

Peso 2009-01-05	re: The simplest way to delete duplicates and compare two result sets in SQL Server DELETE f FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, ... ORDER BY Col1) AS recID ) AS f WHERE recID > 2

Mladen 2009-01-05	re: The simplest way to delete duplicates and compare two result sets in SQL Server awsome! i did not know that. thanx Peter.

Peso 2009-01-05	re: The simplest way to delete duplicates and compare two result sets in SQL Server The Microsoft CHECKSUM is a very, VERY, weak algorithm as proved in the links above. It just a 32-bit value XOR'ed with previous value shifted 4 bits. I strongly, but kindly, vote against this method of using CHECKSUM for comparison.

Zehra Nasif 2009-01-05	re: The simplest way to delete duplicates This will delete all records from the table MyTable which have the same value for the field dupField, leaving that record which has the lowest value in uniqueField. delete from T1 from MyTable T1, MyTable T2 where T1.dupField = T2.dupField and T1.uniqueField > T2.uniqueField

Toni 2009-01-15	re: The simplest way to delete duplicates and compare two result sets in SQL Server In doing any of the comparisons (if that is the way you must go e.g pre-SQL 2005) look out for the case where a field might contain Null. In that case, your duplicate_field_1 = duplicate_field_2 would not find the record.

DBA 2009-01-23	re: The simplest way to delete duplicates and compare two result sets in SQL Server Hello. There is another generic way to compare resultsets. I will try to show an example select a1, a2, a3, a4, sum(v) from ( select a1, a2, a3, a4, v = 1 from table1 union select a1, a2, a3, a4, v = 2 from table2 ) t group by a1, a2, a3, a4, having sum(v) in (1,2)