How to efficiently reuse gaps in identity column

Tue Mar 10, 2009 by Peter Larsson in optimization, sql-server-2008, algorithms, sql-server-2005

Today I am going to talk about how to efficiently reuse identity values in a column, even if this is something that normally not should be bothered. The reason for this solution was a request for help from a member here on SQLTeam, who was near run out of identity values.

I did some reasearch first to see which was the most common method to deal with this situation and not surprisingly the method of iterating all records from start to end was used.
That method is not efficient. What if you have 1 million records and there is only 1 gap at position 15? Do you really have to shuffle all 1 million records to be sure to get rid of the gap?

The answer is of course NO. I developed an algorithm which I call a folding algorithm.

The algorithm works in the way that it looks for how many missing ID's there are, and pick the same number of records from the end of list. For the one million table example above, since there is only one ID missing, I only have to shuffle one record from the end of table to the missing position. Efficient and clean because only one record has to be touched and not 999 986 records!

Since an IDENTITY column most often also is used as a primary key, I have included one auxiliary table which references the base table to fix. In a production environment, this can be several tables, but I only include one for demonstration purposes.

Begin with creating some sample data like this.

SET NOCOUNT ON

/*******************************************************************************

Set up sample data

*******************************************************************************/

CREATE TABLE #Sample

(

ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

data UNIQUEIDENTIFIER DEFAULT NEWID()

)

INSERT #Sample

DEFAULT VALUES

GO 10

DELETE

FROM #Sample

WHERE ID IN(1, 3, 4, 7, 10)

SELECT *

FROM #Sample

CREATE TABLE #Aux

(

AuxID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

SampleID INT NOT NULL REFERENCES #Sample(ID)

)

INSERT #Aux

SELECT 9 UNION ALL

SELECT 8

SELECT *

FROM #Aux

Now we created a sample table with 10 records and then on purpose created some gaps by removing some records manually.
We also created an auxiliary table referencing the base table. Since foreign key contraints are not enforced on temporary tables, this is for demonstration only.

The first step in the algorithm is to create the translation table. This table will hold all the missing ID's and which ID to shuffle.

/*******************************************************************************

Set up ID translation with Peso's "folding" algorithm

*******************************************************************************/

CREATE TABLE #Trans

(

idNew INT NOT NULL,

idOld INT NOT NULL

)

INSERT #Trans

(

idNew,

idOld

)

SELECT d.rn,

w.ID

FROM (

SELECT d.rn,

ROW_NUMBER() OVER (ORDER BY d.rn) AS recID

FROM (

SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rn

FROM #Sample

) AS d

LEFT JOIN #Sample AS s ON s.ID = d.rn

WHERE s.ID IS NULL

) AS d

INNER JOIN (

SELECT ID,

ROW_NUMBER() OVER (ORDER BY ID DESC) AS recID

FROM #Sample

) AS w ON w.recID = d.recID

With this in place we can now begin to create duplicate entries of the records to shuffle. This is riskfree since new inserts are based on the current identity value so this new inserted records will not collide.
What the algorithm does, is to calculate the missing ID's and pick the appropriate number of records from end of table to shuffle and fill the gaps.

Now when we have the #Trans control table, let's begin to shuffle the records and update auxiliary table(s).

/*******************************************************************************

Prepare source table for identity inserts

*******************************************************************************/

BEGIN TRAN

SET IDENTITY_INSERT #Sample ON

INSERT #Sample

(

ID,

data

)

SELECT t.idNew,

s.data

FROM #Trans AS t

INNER JOIN #Sample AS s ON s.ID = t.idOld

IF @@ERROR <> 0

BEGIN

ROLLBACK TRAN

RAISERROR('Insert old records with new ID failed.', 18, 1)

RETURN

END

SET IDENTITY_INSERT #Sample OFF

UPDATE w

SET w.SampleID = t.idNew

FROM #Aux AS w

INNER JOIN #Trans AS t ON t.idOld = w.SampleID

IF @@ERROR <> 0

BEGIN

ROLLBACK TRAN

RAISERROR('Update #Aux table failed.', 18, 1)

RETURN

END

DELETE s

FROM #Sample AS s

INNER JOIN #Trans AS t ON t.idOld = s.ID

IF @@ERROR <> 0

BEGIN

ROLLBACK TRAN

RAISERROR('Delete old records failed.', 18, 1)

RETURN

END

COMMIT TRAN

And now we're set, right? Well, not quite right. Since the IDENTITY value for the base table is stored in the meta data for the table, we need to reset the current IDENTITY value so that we do not produce another gap at the end of IDENTITY sequence. You can do that like this

/*******************************************************************************

Cleanup

*******************************************************************************/

DECLARE @ID INT

SELECT @ID = MAX(ID)

FROM #Sample

DBCC CHECKIDENT(#Sample, RESEED, @ID)

SELECT *

FROM #Sample

SELECT *

FROM #Aux

DROP TABLE #Sample,

#Aux,

#Trans

I have tested this technique in a production system with several users online, and for a reasonable amount of gaps you can run this piece of code during normal operation. But remember the base table will be locked for a small amount of time and will slow down other processes accessing the base table. The best practice is to run this algorithm off-hours.

Good luck!

The complete demonstration code looks like this

SET NOCOUNT ON

/*******************************************************************************

Set up sample data

*******************************************************************************/

CREATE TABLE #Sample

(

ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

data UNIQUEIDENTIFIER DEFAULT NEWID()

)

INSERT #Sample

DEFAULT VALUES

GO 10

DELETE

FROM #Sample

WHERE ID IN(1, 3, 4, 7, 10)

SELECT *

FROM #Sample

CREATE TABLE #Aux

(

AuxID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

SampleID INT NOT NULL REFERENCES #Sample(ID)

)

INSERT #Aux

SELECT 9 UNION ALL

SELECT 8

SELECT *

FROM #Aux

/*******************************************************************************

Set up ID translation with Peso's "folding" algorithm

*******************************************************************************/

CREATE TABLE #Trans

(

idNew INT NOT NULL,

idOld INT NOT NULL

)

INSERT #Trans

(

idNew,

idOld

)

SELECT d.rn,

w.ID

FROM (

SELECT d.rn,

ROW_NUMBER() OVER (ORDER BY d.rn) AS recID

FROM (

SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rn

FROM #Sample

) AS d

LEFT JOIN #Sample AS s ON s.ID = d.rn

WHERE s.ID IS NULL

) AS d

INNER JOIN (

SELECT ID,

ROW_NUMBER() OVER (ORDER BY ID DESC) AS recID

FROM #Sample

) AS w ON w.recID = d.recID

/*******************************************************************************

Prepare source table for identity inserts

*******************************************************************************/

BEGIN TRAN

SET IDENTITY_INSERT #Sample ON

INSERT #Sample

(

ID,

data

)

SELECT t.idNew,

s.data

FROM #Trans AS t

INNER JOIN #Sample AS s ON s.ID = t.idOld

IF @@ERROR <> 0

BEGIN

ROLLBACK TRAN

RAISERROR('Insert old records with new ID failed.', 18, 1)

RETURN

END

SET IDENTITY_INSERT #Sample OFF

UPDATE w

SET w.SampleID = t.idNew

FROM #Aux AS w

INNER JOIN #Trans AS t ON t.idOld = w.SampleID

IF @@ERROR <> 0

BEGIN

ROLLBACK TRAN

RAISERROR('Update #Aux table failed.', 18, 1)

RETURN

END

DELETE s

FROM #Sample AS s

INNER JOIN #Trans AS t ON t.idOld = s.ID

IF @@ERROR <> 0

BEGIN

ROLLBACK TRAN

RAISERROR('Delete old records failed.', 18, 1)

RETURN

END

COMMIT TRAN

/*******************************************************************************

Cleanup

*******************************************************************************/

DECLARE @ID INT

SELECT @ID = MAX(ID)

FROM #Sample

DBCC CHECKIDENT(#Sample, RESEED, @ID)

SELECT *

FROM #Sample

SELECT *

FROM #Aux

DROP TABLE #Sample,

#Aux,

#Trans

Legacy Comments


remote DBA 2009-03-10	re: How to efficiently reuse gaps in identity column The most interesting part of your article is GO 10 :) I did not about this simplest way to iterate batches and have always been writing annoying while cycles. Thank you, Vadym.

Peso 2009-03-10	re: How to efficiently reuse gaps in identity column Books Online is your friend!

Jerry Hung 2009-03-10	re: How to efficiently reuse gaps in identity column Very clever, I liked the bit with ROW_NUMBER() to find reverse-matching ID's Let's hope I don't need to use technique for a long time

Stephen S Sibert 2009-03-13	re: How to efficiently reuse gaps in identity column Thanks for your article. I agree with Vadym in that I had never heard of the GO 10 command and wow! is it useful. I do not have a problem with gaps but find the thought process very valuable. Thanks again!

DLL 2009-03-16	re: How to efficiently reuse gaps in identity column I have to say….thanks….this article is much more clear and efficient…

game boy 2009-03-16	re: How to efficiently reuse gaps in identity column I have searched the net and I should say I have not come across an article like this which is so easy to understand and learn the concepts. cheers!!

Anderson Porto 2010-11-09	re: How to efficiently reuse gaps in identity column But, what if, I'd like to iterat all records from start to end ??? How do I do this?