Thinking outside the box

Patron Saint of Lost Yaks
posts - 203, comments - 734, trackbacks - 4

My Links

Advertisement

News

Archives

Post Categories

How to efficiently reuse gaps in identity column

Today I am going to talk about how to efficiently reuse identity values in a column, even if this is something that normally not should be bothered. The reason for this solution was a request for help from a member here on SQLTeam, who was near run out of identity values.

I did some reasearch first to see which was the most common method to deal with this situation and not surprisingly the method of iterating all records from start to end was used.
That method is not efficient. What if you have 1 million records and there is only 1 gap at position 15? Do you really have to shuffle all 1 million records to be sure to get rid of the gap?

The answer is of course NO. I developed an algorithm which I call a folding algorithm.

The algorithm works in the way that it looks for how many missing ID's there are, and pick the same number of records from the end of list. For the one million table example above, since there is only one ID missing, I only have to shuffle one record from the end of table to the missing position. Efficient and clean because only one record has to be touched and not 999 986 records!

Since an IDENTITY column most often also is used as a primary key, I have included one auxiliary table which references the base table to fix. In a production environment, this can be several tables, but I only include one for demonstration purposes.

Begin with creating some sample data like this.

SET NOCOUNT ON
 
/*******************************************************************************
 Set up sample data
*******************************************************************************/
 
CREATE TABLE #Sample
            (
                    ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                    data UNIQUEIDENTIFIER DEFAULT NEWID()
            )
GO
 
INSERT      #Sample
DEFAULT     VALUES
GO 10
 
DELETE
FROM        #Sample
WHERE       ID IN(1, 3, 4, 7, 10)
 
SELECT      *
FROM        #Sample
 
CREATE TABLE #Aux
            (
                    AuxID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                    SampleID INT NOT NULL REFERENCES #Sample(ID)
            )
 
INSERT      #Aux
SELECT      9 UNION ALL
SELECT      9 UNION ALL
SELECT      8
 
SELECT      *
FROM        #Aux

Now we created a sample table with 10 records and then on purpose created some gaps by removing some records manually.
We also created an auxiliary table referencing the base table. Since foreign key contraints are not enforced on temporary tables, this is for demonstration only.

The first step in the algorithm is to create the translation table. This table will hold all the missing ID's and which ID to shuffle. 

/*******************************************************************************
 Set up ID translation with Peso's "folding" algorithm
*******************************************************************************/
 
CREATE TABLE #Trans
            (
                        idNew INT NOT NULL,
                        idOld INT NOT NULL
            )
 
INSERT      #Trans
            (
                        idNew,
                        idOld
            )
SELECT      d.rn,
            w.ID
FROM        (
                    SELECT  d.rn,
                            ROW_NUMBER() OVER (ORDER BY d.rn) AS recID
                    FROM    (
                                SELECT  ROW_NUMBER() OVER (ORDER BY ID) AS rn
                                FROM    #Sample
                            ) AS d
                    LEFT JOIN   #Sample AS s ON s.ID = d.rn
                    WHERE       s.ID IS NULL
            ) AS d
INNER JOIN (
                 SELECT   ID,
                          ROW_NUMBER() OVER (ORDER BY ID DESC) AS recID
                 FROM     #Sample
            ) AS w ON w.recID = d.recID

With this in place we can now begin to create duplicate entries of the records to shuffle. This is riskfree since new inserts are based on the current identity value so this new inserted records will not collide.
What the algorithm does, is to calculate the missing ID's and pick the appropriate number of records from end of table to shuffle and fill the gaps.

Now when we have the #Trans control table, let's begin to shuffle the records and update auxiliary table(s).

/*******************************************************************************
 Prepare source table for identity inserts
*******************************************************************************/
 
BEGIN TRAN
 
SET IDENTITY_INSERT #Sample ON
 
INSERT      #Sample
            (
                        ID,
                        data
            )
SELECT      t.idNew,
            s.data
FROM        #Trans AS t
INNER JOIN #Sample AS s ON s.ID = t.idOld
 
IF @@ERROR <> 0
      BEGIN
               ROLLBACK TRAN
               RAISERROR('Insert old records with new ID failed.', 18, 1)
               RETURN
      END
 
SET IDENTITY_INSERT #Sample OFF
 
UPDATE      w
SET         w.SampleID = t.idNew
FROM        #Aux AS w
INNER JOIN #Trans AS t ON t.idOld = w.SampleID
 
IF @@ERROR <> 0
     BEGIN
               ROLLBACK TRAN
               RAISERROR('Update #Aux table failed.', 18, 1)
               RETURN
     END
 
DELETE     s
FROM       #Sample AS s
INNER JOIN #Trans AS t ON t.idOld = s.ID
 
IF @@ERROR <> 0
     BEGIN
               ROLLBACK TRAN
               RAISERROR('Delete old records failed.', 18, 1)
               RETURN
     END
 
COMMIT TRAN

And now we're set, right? Well, not quite right. Since the IDENTITY value for the base table is stored in the meta data for the table, we need to reset the current IDENTITY value so that we do not produce another gap at the end of IDENTITY sequence. You can do that like this

/*******************************************************************************
 Cleanup
*******************************************************************************/
 
DECLARE     @ID INT
 
SELECT      @ID = MAX(ID)
FROM        #Sample
 
DBCC CHECKIDENT(#Sample, RESEED, @ID)
 
SELECT      *
FROM        #Sample
 
SELECT      *
FROM        #Aux
 
DROP TABLE #Sample,
            #Aux,
            #Trans


I have tested this technique in a production system with several users online, and for a reasonable amount of gaps you can run this piece of code during normal operation. But remember the base table will be locked for a small amount of time and will slow down other processes accessing the base table. The best practice is to run this algorithm off-hours.

Good luck!

The complete demonstration code looks like this

SET NOCOUNT ON
 
/*******************************************************************************
 Set up sample data
*******************************************************************************/
 
CREATE TABLE #Sample
            (
                         ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                         data UNIQUEIDENTIFIER DEFAULT NEWID()
            )
GO
 
INSERT      #Sample
DEFAULT     VALUES
GO 10
 
DELETE
FROM        #Sample
WHERE       ID IN(1, 3, 4, 7, 10)
 
SELECT      *
FROM        #Sample
 
CREATE TABLE #Aux
            (
                         AuxID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                         SampleID INT NOT NULL REFERENCES #Sample(ID)
            )
 
INSERT      #Aux
SELECT      9 UNION ALL
SELECT      9 UNION ALL
SELECT      8
 
SELECT      *
FROM        #Aux
 
/*******************************************************************************
 Set up ID translation with Peso's "folding" algorithm
*******************************************************************************/
 
CREATE TABLE #Trans
            (
                         idNew INT NOT NULL,
                         idOld INT NOT NULL
            )
 
INSERT      #Trans
            (
                        idNew,
                        idOld
            )
SELECT      d.rn,
            w.ID
FROM        (
                        SELECT      d.rn,
                                     ROW_NUMBER() OVER (ORDER BY d.rn) AS recID
                        FROM        (
                                                 SELECT      ROW_NUMBER() OVER (ORDER BY ID) AS rn
                                                 FROM        #Sample
                                     ) AS d
                         LEFT JOIN   #Sample AS s ON s.ID = d.rn
                         WHERE       s.ID IS NULL
            ) AS d
INNER JOIN (
                        SELECT      ID,
                                     ROW_NUMBER() OVER (ORDER BY ID DESC) AS recID
                        FROM        #Sample
            ) AS w ON w.recID = d.recID
 
/*******************************************************************************
 Prepare source table for identity inserts
*******************************************************************************/
 
BEGIN TRAN
 
SET IDENTITY_INSERT #Sample ON
 
INSERT      #Sample
            (
                        ID,
                        data
            )
SELECT      t.idNew,
            s.data
FROM        #Trans AS t
INNER JOIN #Sample AS s ON s.ID = t.idOld
 
IF @@ERROR <> 0
            BEGIN
                         ROLLBACK TRAN
                         RAISERROR('Insert old records with new ID failed.', 18, 1)
                         RETURN
            END
 
SET IDENTITY_INSERT #Sample OFF
 
UPDATE      w
SET         w.SampleID = t.idNew
FROM        #Aux AS w
INNER JOIN #Trans AS t ON t.idOld = w.SampleID
 
IF @@ERROR <> 0
            BEGIN
                         ROLLBACK TRAN
                         RAISERROR('Update #Aux table failed.', 18, 1)
                         RETURN
            END
 
DELETE      s
FROM        #Sample AS s
INNER JOIN #Trans AS t ON t.idOld = s.ID
 
IF @@ERROR <> 0
            BEGIN
                         ROLLBACK TRAN
                         RAISERROR('Delete old records failed.', 18, 1)
                         RETURN
            END
 
COMMIT TRAN
 
/*******************************************************************************
 Cleanup
*******************************************************************************/
 
DECLARE     @ID INT
 
SELECT      @ID = MAX(ID)
FROM        #Sample
 
DBCC CHECKIDENT(#Sample, RESEED, @ID)
 
SELECT      *
FROM        #Sample
 
SELECT      *
FROM        #Aux
 
DROP TABLE #Sample,
            #Aux,
            #Trans


Print | posted on Tuesday, March 10, 2009 9:06 AM | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 ]

Feedback

Gravatar

# re: How to efficiently reuse gaps in identity column

The most interesting part of your article is
GO 10 :)
I did not about this simplest way to iterate batches and have always been writing annoying while cycles.
Thank you, Vadym.
3/10/2009 10:48 AM | remote DBA
Gravatar

# re: How to efficiently reuse gaps in identity column

Books Online is your friend!
3/10/2009 12:30 PM | Peso
Gravatar

# re: How to efficiently reuse gaps in identity column

Very clever, I liked the bit with ROW_NUMBER() to find reverse-matching ID's

Let's hope I don't need to use technique for a long time
3/10/2009 7:54 PM | Jerry Hung
Gravatar

# re: How to efficiently reuse gaps in identity column

Thanks for your article. I agree with Vadym in that I had never heard of the GO 10 command and wow! is it useful. I do not have a problem with gaps but find the thought process very valuable. Thanks again!
3/13/2009 7:15 PM | Stephen S Sibert
Gravatar

# re: How to efficiently reuse gaps in identity column

I have to say….thanks….this article is much more clear and efficient…
3/16/2009 10:10 AM | DLL
Gravatar

# re: How to efficiently reuse gaps in identity column

I have searched the net and I should say I have not come across an article like this which is so easy to understand and learn the concepts.
cheers!!
3/16/2009 11:16 AM | game boy
Gravatar

# re: How to efficiently reuse gaps in identity column

But, what if, I'd like to iterat all records from start to end ??? How do I do this?
11/9/2010 6:04 PM | Anderson Porto
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET