DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
This INTERSECT method has no equivalent in TSQL
INTERSECT is simply all rows that are in the First table and the Second table
We can take a coding short cut and simple use our JOIN method.
There is only the 1 signature for this method.
In summary the code works as follows:
Get a reference to all columns
Join on all columns
Return table
I have included the DISTINCT method as a helper method to assist us in the DIVIDE method that will follow in a later post.
But in retrospect, it should have been one of the first methods I wrote for this class.
If you have been paying attention, you will notice that the PROJECT and UNION methods will return duplicates.
In my opinion the most serious was the UNION flaw as by default even SQL Server does not produce duplicates from this method.
ie:
In SQL Server A UNION A = A but our methods produces A UNION A = A+A
Even though I warned you of duplicates I still feel bad about it! Sorry!
The PROJECT method was equivalent to the SQL Server implementation but perhaps it shouldn't have been.
Obviously it is trivial to change these methods to confirm now that the DISTINCT methods exist and I leave it up to you to do that.
My current version of these methods now has an overload that takes a boolean indicating wether to retain duplicates.
By default they do not! Redemption!
The DISTINCT method uses another method (RowEqual) for row equality. Even though only the one method uses this function, to incorporate it directly into the Distinct method would be a slight performance hit.
There are 5 signature for the DISTINCT method.
In summary the code works as follows:
Create new table
Add Distinct columns and prepare sort expression
Select all sorted rows
Loop over rows and check against previous row
Add only unique rows
Return table
If anybody can come up with a faster solution for Distinct it would be most appreciated.
public static DataTable Intersect(DataTable First, DataTable Second)
{
//Get reference to Columns in First
DataColumn[] firstcolumns = new DataColumn[First.Columns.Count];
for(int i = 0; i < firstcolumns.Length; i++)
{
firstcolumns[i] = First.Columns[i];
}
//Get reference to Columns in Second
DataColumn[] secondcolumns = new DataColumn[Second.Columns.Count];
for(int i = 0; i < secondcolumns.Length; i++)
{
secondcolumns[i] = Second.Columns[i];
}
//JOIN ON all columns
DataTable table = SQLOps.Join(First, Second, firstcolumns, secondcolumns);
table.TableName = "Intersect";
return table;
}
private static bool RowEqual(object[] Values, object[] OtherValues)
{
if(Values == null)
return false;
for(int i = 0; i < Values.Length; i++)
{
if(!Values[i].Equals(OtherValues[i]))
return false;
}
return true;
}
public static DataTable Distinct(DataTable Table, DataColumn[] Columns)
{
//Empty table
DataTable table = new DataTable("Distinct");
//Sort variable
string sort = string.Empty;
//Add Columns & Build Sort expression
for(int i = 0; i < Columns.Length; i++)
{
table.Columns.Add(Columns[i].ColumnName,Columns[i].DataType);
sort += Columns[i].ColumnName + ",";
}
//Select all rows and sort
DataRow[] sortedrows = Table.Select(string.Empty,sort.Substring(0,sort.Length-1));
object[] currentrow = null;
object[] previousrow = null;
table.BeginLoadData();
foreach(DataRow row in sortedrows)
{
//Current row
currentrow = new object[Columns.Length];
for(int i = 0; i < Columns.Length; i++)
{
currentrow[i] = row[Columns[i].ColumnName];
}
//Match Current row to previous row
if(!SQLOps.RowEqual(previousrow, currentrow))
table.LoadDataRow(currentrow,true);
//Previous row
previousrow = new object[Columns.Length];
for(int i = 0; i < Columns.Length; i++)
{
previousrow[i] = row[Columns[i].ColumnName];
}
}
table.EndLoadData();
return table;
}
public static DataTable Distinct(DataTable Table, DataColumn Column)
{
return Distinct(Table, new DataColumn[]{Column});
}
public static DataTable Distinct(DataTable Table, string Column)
{
return Distinct(Table, Table.Columns[Column]);
}
public static DataTable Distinct(DataTable Table, params string[] Columns)
{
DataColumn[] columns = new DataColumn[Columns.Length];
for(int i = 0; i < Columns.Length; i++)
{
columns[i] = Table.Columns[Columns[i]];
}
return Distinct(Table, columns);
}
public static DataTable Distinct(DataTable Table)
{
DataColumn[] columns = new DataColumn[Table.Columns.Count];
for(int i = 0; i < Table.Columns.Count; i++)
{
columns[i] = Table.Columns[i];
}
return Distinct(Table, columns);
}
Legacy Comments
Marcus Tucker
2004-01-24 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology From a "best programming practice" standpoint, might I be so bold as to suggest that you really shouldn't be using two variables called "Table" and "table"?! At the very least call them "Source" and "Destination" or "Input" and "Output"...! ;) |
DavidM
2004-01-24 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology You can be as bold as you like Marcus. The first method I coded was UNION and the variable "table" did not clash with the input arguments. "table" simply became the defacto standard for the return DataTable across the class. If that is your only concern, I am delighted. |
Marcus Tucker
2004-01-25 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology I haven't yet found the time to start playing with .Net (and even if I do I think I'll wait until Whidbey comes out), but have been reading a few .Net articles here and there in preparation. The table/Table naming just struck me as surprising, that's all! :) |
Johnny
2004-01-30 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology isn't intersect same as JOIN in SQL? |
DavidM
2004-01-30 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology Similar but not the same. INTERSECT requires that both tables have the same heading. As you can see I use the JOIN method to achieve the result. |
marco alves
2004-06-24 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology Private Function RowEqual(ByVal Values As Object(), ByVal OtherValues As Object()) As Boolean If (Values Is Nothing) Then Return False End If For i As Integer = 0 To Values.Length - 1 If (Not Values(i).Equals(OtherValues(i))) Then Return False End If Next Return True End Function Private Function Distinct(ByVal Tabela As DataTable, ByVal Columns As DataColumn()) As DataTable 'Empty table Dim table As DataTable = New DataTable("Distinct") 'Sort variable Dim sort As String = String.Empty 'Add Columns & Build Sort expression For i As Integer = 0 To Columns.Length - 1 table.Columns.Add(Columns(i).ColumnName, Columns(i).DataType) sort += Columns(i).ColumnName + "," Next 'Select all rows and sort Dim sortedrows As DataRow() = table.Select(String.Empty, sort.Substring(0, sort.Length - 1)) Dim currentrow As Object() = Nothing Dim previousrow As Object() = Nothing table.BeginLoadData() For Each row As DataRow In sortedrows ReDim currentrow(Columns.Length) For i As Integer = 0 To Columns.Length - 1 currentrow(i) = row(Columns(i).ColumnName) Next 'Match Current row to previous row If (Not RowEqual(previousrow, currentrow)) Then table.LoadDataRow(currentrow, True) End If 'Previous row ReDim previousrow(Columns.Length) For i As Integer = 0 To Columns.Length - 1 previousrow(i) = row(Columns(i).ColumnName) Next Next table.EndLoadData() Return table End Function Private Function Distinct(ByVal Table As DataTable, ByVal Column As DataColumn) As DataTable Return Distinct(Table, New DataColumn() {Column}) End Function Private Function Distinct(ByVal Table As DataTable, ByVal Column As String) As DataTable Return Distinct(Table, Table.Columns(Column)) End Function Private Function Distinct(ByVal Table As DataTable, ByVal ParamArray Columns As String()) As DataTable Dim ccolumns As DataColumn() ReDim ccolumns(Columns.Length) For i As Integer = 0 To Columns.Length - 1 ccolumns(i) = Table.Columns(Columns(i)) Next Return Distinct(Table, ccolumns) End Function Private Function Distinct(ByVal Table As DataTable) As DataTable Dim ccolumns As DataColumn() ReDim ccolumns(Table.Columns.Count) For i As Integer = 0 To Table.Columns.Count - 1 ccolumns(i) = Table.Columns(i) Next Return Distinct(Table, ccolumns) End Function |
marco alves
2004-06-24 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology translated do vb.net. please, modify from private to static. |
Danilo Piazzalunga
2004-11-11 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology You could try this to make the Distinct method a little faster: 1. Use Array.Copy to copy an array instead of doing it by hand (it surely isn't any slower). 2. The array needs to be copied only once, not twice. 3. Move the row comparison at the very beginning of the loop, avoiding to needlessly copying the row when it is not going to be inserted. object[] currentrow = null; object[] previousrow = null; table.BeginLoadData(); foreach(DataRow row in sortedrows) { //Match Current row to previous row if(SQLOps.RowEqual(previousrow, row.ItemArray)) continue; //Skip row if identical //Insert a copy of current row currentrow = new object[Columns.Length]; Array.Copy(row.ItemArray, currentrow, Columns.Length); table.LoadDataRow(currentrow,true); //Previous row previousrow = currentrow; } table.EndLoadData(); return table; |
Danilo Piazzalunga
2004-11-11 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology Er... I must have been drunk or something when I wrote that code, it was completely bogus. The only good thing is that, if tested in some corner case, it appears to work 8-) One point still stands (or at least I think so): you can avoid allocating a new array for 'previousrow' and copying it from 'row' a second time, simply stating 'previousrow = currentrow' should be enough. And, most importantly, I forgot to congratulate with you for this series. It really helps a lot! :-) |
iret
2005-11-20 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology Why not use hashcode to make it fater? 1000 cols * 1000 rows use the hashcode is 10 times faster try my code: System.Collections.Hashtable ht = new System.Collections.Hashtable(); table.BeginLoadData(); foreach(DataRow row in Table.Rows) { int hash = string.Concat(row.ItemArray).GetHashCode(); if (ht.Contains(hash)) continue; ht.Add(hash,hash); //Insert a copy of current row currentrow = new object[Columns.Length]; Array.Copy(row.ItemArray, currentrow, Columns.Length); table.LoadDataRow(currentrow,true); } |
Oleg Deribas
2006-07-01 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology // For .NET 2.0: public static DataTable Distinct(DataTable table, string[] columns) { DataView dw = new DataView(table); return dw.ToTable(true, columns); } // :-) |
Josh Ross
2007-03-10 |
Union, UnionAll, Merge Why not use dataTableA.Merge(dataTableB) instead of Union? If you want to keep the Union Method I would rename it to UnionAll. |
www.mofun.cc
2009-03-11 |
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology good! but the code isn't full |