Posts
83
Comments
600
Trackbacks
40
April 2004 Entries
Deltas

It's all about the update...

EDIT: 9/9/2004:  Well that wasn't much of an explanation.  What the hell are deltas.....is it a place in Mississippi where they have great music and food?  Well, yes....but deltas here are meant to describe a process, where by a database table has a set of data WITH a primary key, and a file is supplied daily with “updates“ (and I use that term loosely) that need to be applied to the table.  Both the file and the table do not have to have the same structure, number of columns, datatypes and/or sizes...but the MUST have a key that is unique (on a flat file?...well yeah....).

 

1st.  Load the file to a staging table.  (make sure the PK is defined to make sure you're good to go)

2nd.  You check to see what data in the staging table is not in the table.  These are INSERTS

3rd.  You need to see what's in that staging table that's missing in the table.  These are DELETES.

4th.  Lastly you need to see what is the same and what changes need to be applied.  These are the UPDATES

Below is a method that I believe is the most effecient to do this.

Guess this could apply to any platform.  Defenitley influenced by SQT Team...but I think I first saw this from the Dr., Jeff Smith.

http://www.sqlteam.com/forums/pop_profile.asp?mode=display&id=6859

Deletes and inserts would be index seek operation, but also benefit from the reduction in the number of look ups based on the delta derived table.  The update should see huge performace gains.

USE Northwind
GO

SET NOCOUNT ON
GO

SELECT * FROM Employees

-- Create some test data
SELECT EmployeeID
,LastName
,FirstName
,Title
,TitleOfCourtesy
,BirthDate
,HireDate
,Address
,City
,Region
,PostalCode
,Country
,HomePhone
,Extension
,ReportsTo
,PhotoPath
 INTO EMP1 FROM Employees  -- Pretend this is the New File

SELECT EmployeeID
,LastName
,FirstName
,Title
,TitleOfCourtesy
,BirthDate
,HireDate
,Address
,City
,Region
,PostalCode
,Country
,HomePhone
,Extension
,ReportsTo
,PhotoPath
 INTO EMP2 FROM Employees  -- Pretend this is the database table that needs to be acted upon
GO

-- Create some changes
DELETE FROM EMP1 WHERE EmployeeID = 1
INSERT INTO EMP1 (LastName, FirstName) SELECT 'Kaiser','Brett'
UPDATE EMP1 SET LastName = 'White' WHERE EmployeeId = 4
GO

--Check For Deltas
SELECT * INTO DELTAS
  FROM (
  SELECT * FROM EMP1 
      UNION ALL
  SELECT * FROM EMP2
 ) AS XXX
GROUP BY
   EmployeeID
 , LastName
 , FirstName
 , Title
 , TitleOfCourtesy
 , BirthDate
 , HireDate
 , Address
 , City
 , Region
 , PostalCode
 , Country
 , HomePhone
 , Extension
 , ReportsTo
 , PhotoPath
HAVING COUNT(*) = 1

-- Add New recocrds
INSERT INTO EMP2 (LastName, FirstName)
     SELECT l.LastName, l.FirstName
       FROM EMP1 l
  LEFT JOIN EMP2 r
  ON l.EmployeeId = r.EmployeeId
      WHERE r.EmployeeID Is NULL
        AND EXISTS (SELECT * FROM DELTAS d WHERE l.EmployeeId = d.EmployeeId)

-- Delete missing
DELETE FROM EMP2
WHERE EmployeeID IN (
    SELECT r.EmployeeID 
      FROM EMP1 l
RIGHT JOIN EMP2 r
 ON l.EmployeeId = r.EmployeeId
     WHERE l.EmployeeID Is NULL
       AND EXISTS (SELECT * FROM DELTAS d WHERE r.EmployeeId = d.EmployeeId))

-- Update the changes ....update all columns for simplicity
UPDATE r
  SET r.LastName  = l.Lastname
    , r.FirstName = l.Firstname
      FROM EMP1 l
INNER JOIN EMP2 r
 ON l.EmployeeId = r.EmployeeId
INNER JOIN ( SELECT EmployeeId FROM DELTAS GROUP BY EmployeeId HAVING COUNT(*) =2) AS d
 ON r.EmployeeId = d.EmployeeId
GO

-- Take a peek -- files should now match
   SELECT 'NEW FILE' AS SOURCE, * FROM EMP1
UNION ALL
   SELECT 'TARGET FILE' AS SOURCE, * FROM EMP2
 ORDER BY EmployeeID
GO

-- Clean up this mess
SET NOCOUNT OFF
GO

DROP TABLE DELTAS
DROP TABLE EMP1
DROP TABLE EMP2
GO

 

posted @ Friday, April 23, 2004 11:49 AM | Feedback (0)
Reasons to not use SELECT * (and when to use it)

I'd like to expand this list with anyones comments.  Thanks!  And if anyone knows the M$ link that discusses how they optimized SELECT * for existance, I'd appreciate a linke.  Thamks again!

Why you shouldn't

1. Effeciency: Only the data you need
 Reduces the amount of I/O that has to occur

2. Better use of indexes

May use Index intersection or make use of covered indexes

3. Isolate code from Table object changes
For example, the following would through an error
 INSERT INTO myTable99(Col1, Col2, Col3,ect)
 SELECT * FROM myTable00 WHERE

4. I get a chuckle when Access developers say DELETE * FROM mytable99
 'nough said

5. Increases network traffic, requires more buffers and processing
Similar to Item #1

6. Constantly accessing the system tables to figure out what these columns are.

7. Because Miracles happen

http://www.dbforums.com/t996606.html

8. When a VIEW is Created a SELECT * , and the Table is altered or dropped and recreated, the view will refernce the old table structure as ehorn points out

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=35975

 


When to Use SELECT *

In some instances, it is more effecient to use SELECT *. 

1. In a coorelated query, SELECT * is actually optimized
 SELECT Col1, Col2, Col3 FROM myTable99 a
 WHERE EXISTS (SELECT * FROM myTable00 b WHERE a.id = b.id)

2. In an existence check as well
 IF EXISTS(SELECT * FROM myTable99)

3. EDIT: Actually, Jeff Smith (aka Dr. Cross Join) makes a very valid point in this link.  Let's say that you are making a derived table, especially when there may be many derived tables that are nested, what then would be the harm in passing up the columns using SELECT * ?  My only downside is that developers not used to correct coding methods may adopt this style in ways that are not good. Also this is not to say that SELECT * is appropriate for this intial derived table, which would again be selecting way more than you need.

 

posted @ Thursday, April 22, 2004 11:00 AM | Feedback (6)