byrmol Blog

Garbage

DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology

This INTERSECT method has no equivalent in TSQL

INTERSECT is simply all rows that are in the First table and the Second table
We can take a coding short cut and simple use our JOIN method.

There is only the 1 signature for this method.
In summary the code works as follows:

  Get a reference to all columns
  Join on all columns
  Return table

I have included the DISTINCT method as a helper method to assist us in the DIVIDE method that will follow in a later post.
But in retrospect, it should have been one of the first methods I wrote for this class.

If you have been paying attention, you will notice that the PROJECT and UNION methods will return duplicates.
In my opinion the most serious was the UNION flaw as by default even SQL Server does not produce duplicates from this method.
ie:
  In SQL Server A UNION A = A but our methods produces A UNION A = A+A

Even though I warned you of duplicates I still feel bad about it! Sorry!

The PROJECT method was equivalent to the SQL Server implementation but perhaps it shouldn't have been.

Obviously it is trivial to change these methods to confirm now that the DISTINCT methods exist and I leave it up to you to do that.

My current version of these methods now has an overload that takes a boolean indicating wether to retain duplicates.
By default they do not! Redemption!

The DISTINCT method uses another method (RowEqual) for row equality. Even though only the one method uses this function, to incorporate it directly into the Distinct method would be a slight performance hit.

There are 5 signature for the DISTINCT method.
In summary the code works as follows:

  Create new table
  Add Distinct columns and prepare sort expression
  Select all sorted rows
  Loop over rows and check against previous row
  Add only unique rows
  Return table

If anybody can come up with a faster solution for Distinct it would be most appreciated.

public static DataTable Intersect(DataTable First, DataTable Second)

{

      //Get reference to Columns in First

      DataColumn[] firstcolumns  = new DataColumn[First.Columns.Count];

      for(int i = 0; i < firstcolumns.Length; i++)

      {

            firstcolumns[i] = First.Columns[i];

      }

      //Get reference to Columns in Second

      DataColumn[] secondcolumns  = new DataColumn[Second.Columns.Count];

      for(int i = 0; i < secondcolumns.Length; i++)

      {

            secondcolumns[i] = Second.Columns[i];

      }

      //JOIN ON all columns

      DataTable table = SQLOps.Join(First, Second, firstcolumns, secondcolumns);

      table.TableName = "Intersect";

      return table;

}

 

private static bool RowEqual(object[] Values, object[] OtherValues)

{

      if(Values == null)

            return false;

 

      for(int i = 0; i < Values.Length; i++)

      {

            if(!Values[i].Equals(OtherValues[i]))

                  return false;

      }                      

      return true;

} 

 

 

public static DataTable Distinct(DataTable Table, DataColumn[] Columns)

{

      //Empty table

      DataTable table = new DataTable("Distinct");

      //Sort variable

      string sort = string.Empty;

 

      //Add Columns & Build Sort expression

      for(int i = 0; i < Columns.Length; i++)

      {

            table.Columns.Add(Columns[i].ColumnName,Columns[i].DataType);

            sort += Columns[i].ColumnName + ",";

      }

      //Select all rows and sort

      DataRow[] sortedrows = Table.Select(string.Empty,sort.Substring(0,sort.Length-1));

     

      object[] currentrow = null;

      object[] previousrow = null;

     

      table.BeginLoadData();

      foreach(DataRow row in sortedrows)

      {

            //Current row

            currentrow = new object[Columns.Length];

            for(int i = 0; i < Columns.Length; i++)

            {

                  currentrow[i] = row[Columns[i].ColumnName];

            }

 

            //Match Current row to previous row

            if(!SQLOps.RowEqual(previousrow, currentrow))

                  table.LoadDataRow(currentrow,true);

 

            //Previous row

            previousrow = new object[Columns.Length];

            for(int i = 0; i < Columns.Length; i++)

            {

                  previousrow[i] = row[Columns[i].ColumnName];

            }

 

      }

      table.EndLoadData();

      return table;

 

}

 

public static DataTable Distinct(DataTable Table, DataColumn Column)

{

      return Distinct(Table, new DataColumn[]{Column});

}

public static DataTable Distinct(DataTable Table, string Column)

{

      return Distinct(Table, Table.Columns[Column]);

}

public static DataTable Distinct(DataTable Table, params string[] Columns)

{

      DataColumn[] columns = new DataColumn[Columns.Length];

      for(int i = 0; i < Columns.Length; i++)

      {

            columns[i] = Table.Columns[Columns[i]];

     

      }

      return Distinct(Table, columns);

}

public static DataTable Distinct(DataTable Table)

{

      DataColumn[] columns = new DataColumn[Table.Columns.Count];

      for(int i = 0; i < Table.Columns.Count; i++)

      {

            columns[i] = Table.Columns[i];

     

      }

      return Distinct(Table, columns);

}

Legacy Comments


Marcus Tucker
2004-01-24
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
From a "best programming practice" standpoint, might I be so bold as to suggest that you really shouldn't be using two variables called "Table" and "table"?! At the very least call them "Source" and "Destination" or "Input" and "Output"...! ;)

DavidM
2004-01-24
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
You can be as bold as you like Marcus.

The first method I coded was UNION and the variable "table" did not clash with the input arguments. "table" simply became the defacto standard for the return DataTable across the class.

If that is your only concern, I am delighted.

Marcus Tucker
2004-01-25
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
I haven't yet found the time to start playing with .Net (and even if I do I think I'll wait until Whidbey comes out), but have been reading a few .Net articles here and there in preparation.

The table/Table naming just struck me as surprising, that's all! :)

Johnny
2004-01-30
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
isn't intersect same as JOIN in SQL?

DavidM
2004-01-30
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
Similar but not the same.

INTERSECT requires that both tables have the same heading. As you can see I use the JOIN method to achieve the result.

marco alves
2004-06-24
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
Private Function RowEqual(ByVal Values As Object(), ByVal OtherValues As Object()) As Boolean
If (Values Is Nothing) Then
Return False
End If

For i As Integer = 0 To Values.Length - 1
If (Not Values(i).Equals(OtherValues(i))) Then
Return False
End If
Next

Return True
End Function

Private Function Distinct(ByVal Tabela As DataTable, ByVal Columns As DataColumn()) As DataTable
'Empty table
Dim table As DataTable = New DataTable("Distinct")

'Sort variable
Dim sort As String = String.Empty

'Add Columns & Build Sort expression
For i As Integer = 0 To Columns.Length - 1
table.Columns.Add(Columns(i).ColumnName, Columns(i).DataType)
sort += Columns(i).ColumnName + ","
Next

'Select all rows and sort
Dim sortedrows As DataRow() = table.Select(String.Empty, sort.Substring(0, sort.Length - 1))

Dim currentrow As Object() = Nothing
Dim previousrow As Object() = Nothing

table.BeginLoadData()

For Each row As DataRow In sortedrows
ReDim currentrow(Columns.Length)

For i As Integer = 0 To Columns.Length - 1
currentrow(i) = row(Columns(i).ColumnName)
Next

'Match Current row to previous row
If (Not RowEqual(previousrow, currentrow)) Then
table.LoadDataRow(currentrow, True)
End If

'Previous row
ReDim previousrow(Columns.Length)

For i As Integer = 0 To Columns.Length - 1
previousrow(i) = row(Columns(i).ColumnName)
Next

Next
table.EndLoadData()
Return table
End Function

Private Function Distinct(ByVal Table As DataTable, ByVal Column As DataColumn) As DataTable
Return Distinct(Table, New DataColumn() {Column})
End Function

Private Function Distinct(ByVal Table As DataTable, ByVal Column As String) As DataTable
Return Distinct(Table, Table.Columns(Column))
End Function

Private Function Distinct(ByVal Table As DataTable, ByVal ParamArray Columns As String()) As DataTable
Dim ccolumns As DataColumn()

ReDim ccolumns(Columns.Length)

For i As Integer = 0 To Columns.Length - 1
ccolumns(i) = Table.Columns(Columns(i))
Next

Return Distinct(Table, ccolumns)
End Function

Private Function Distinct(ByVal Table As DataTable) As DataTable
Dim ccolumns As DataColumn()

ReDim ccolumns(Table.Columns.Count)

For i As Integer = 0 To Table.Columns.Count - 1
ccolumns(i) = Table.Columns(i)
Next

Return Distinct(Table, ccolumns)
End Function

marco alves
2004-06-24
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
translated do vb.net. please, modify from private to static.

Danilo Piazzalunga
2004-11-11
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
You could try this to make the Distinct method a little faster:
1. Use Array.Copy to copy an array instead of doing it by hand (it surely isn't any slower).
2. The array needs to be copied only once, not twice.
3. Move the row comparison at the very beginning of the loop, avoiding to needlessly copying the row when it is not going to be inserted.



object[] currentrow = null;
object[] previousrow = null;

table.BeginLoadData();
foreach(DataRow row in sortedrows)
{
//Match Current row to previous row
if(SQLOps.RowEqual(previousrow, row.ItemArray))
continue; //Skip row if identical

//Insert a copy of current row
currentrow = new object[Columns.Length];
Array.Copy(row.ItemArray, currentrow, Columns.Length);
table.LoadDataRow(currentrow,true);

//Previous row
previousrow = currentrow;
}
table.EndLoadData();
return table;

Danilo Piazzalunga
2004-11-11
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
Er... I must have been drunk or something when I wrote that code, it was completely bogus. The only good thing is that, if tested in some corner case, it appears to work 8-)

One point still stands (or at least I think so): you can avoid allocating a new array for 'previousrow' and copying it from 'row' a second time, simply stating 'previousrow = currentrow' should be enough.

And, most importantly, I forgot to congratulate with you for this series. It really helps a lot! :-)

iret
2005-11-20
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
Why not use hashcode to make it fater?
1000 cols * 1000 rows
use the hashcode is 10 times faster
try my code:

System.Collections.Hashtable ht = new System.Collections.Hashtable();
table.BeginLoadData();

foreach(DataRow row in Table.Rows)
{

int hash = string.Concat(row.ItemArray).GetHashCode();
if (ht.Contains(hash))
continue;
ht.Add(hash,hash);
//Insert a copy of current row
currentrow = new object[Columns.Length];
Array.Copy(row.ItemArray, currentrow, Columns.Length);
table.LoadDataRow(currentrow,true);

}

Oleg Deribas
2006-07-01
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
// For .NET 2.0:

public static DataTable Distinct(DataTable table, string[] columns)
{
DataView dw = new DataView(table);
return dw.ToTable(true, columns);
}

// :-)

Josh Ross
2007-03-10
Union, UnionAll, Merge
Why not use dataTableA.Merge(dataTableB) instead of Union? If you want to keep the Union Method I would rename it to UnionAll.

www.mofun.cc
2009-03-11
re: DataTable Relational Operators in C# - INTERSECT Method, DISTINCT helper method & an apology
good!
but the code isn't full