How to efficiently reuse gaps in identity column
Today I am going to talk about how to efficiently reuse identity values in a column, even if this is something that normally not should be bothered. The reason for this solution was a request for help from a member here on SQLTeam, who was near run out of identity values.
I did some reasearch first to see which was the most common method to deal with this situation and not surprisingly the method of iterating all records from start to end was used.
That method is not efficient. What if you have 1 million records and there is only 1 gap at position 15? Do you really have to shuffle all 1 million records to be sure to get rid of the gap?
The answer is of course NO. I developed an algorithm which I call a folding algorithm.
The algorithm works in the way that it looks for how many missing ID's there are, and pick the same number of records from the end of list. For the one million table example above, since there is only one ID missing, I only have to shuffle one record from the end of table to the missing position. Efficient and clean because only one record has to be touched and not 999 986 records!
Since an IDENTITY column most often also is used as a primary key, I have included one auxiliary table which references the base table to fix. In a production environment, this can be several tables, but I only include one for demonstration purposes.
Begin with creating some sample data like this.
SET NOCOUNT ON
/*******************************************************************************
Set up sample data
*******************************************************************************/
CREATE TABLE #Sample
(
ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
data UNIQUEIDENTIFIER DEFAULT NEWID()
)
GO
INSERT #Sample
DEFAULT VALUES
GO 10
DELETE
FROM #Sample
WHERE ID IN(1, 3, 4, 7, 10)
SELECT *
FROM #Sample
CREATE TABLE #Aux
(
AuxID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
SampleID INT NOT NULL REFERENCES #Sample(ID)
)
INSERT #Aux
SELECT 9 UNION ALL
SELECT 9 UNION ALL
SELECT 8
SELECT *
FROM #Aux
|
Now we created a sample table with 10 records and then on purpose created some gaps by removing some records manually.
We also created an auxiliary table referencing the base table. Since foreign key contraints are not enforced on temporary tables, this is for demonstration only.
The first step in the algorithm is to create the translation table. This table will hold all the missing ID's and which ID to shuffle.
/*******************************************************************************
Set up ID translation with Peso's "folding" algorithm
*******************************************************************************/
CREATE TABLE #Trans
(
idNew INT NOT NULL,
idOld INT NOT NULL
)
INSERT #Trans
(
idNew,
idOld
)
SELECT d.rn,
w.ID
FROM (
SELECT d.rn,
ROW_NUMBER() OVER (ORDER BY d.rn) AS recID
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rn
FROM #Sample
) AS d
LEFT JOIN #Sample AS s ON s.ID = d.rn
WHERE s.ID IS NULL
) AS d
INNER JOIN (
SELECT ID,
ROW_NUMBER() OVER (ORDER BY ID DESC) AS recID
FROM #Sample
) AS w ON w.recID = d.recID
|
With this in place we can now begin to create duplicate entries of the records to shuffle. This is riskfree since new inserts are based on the current identity value so this new inserted records will not collide.
What the algorithm does, is to calculate the missing ID's and pick the appropriate number of records from end of table to shuffle and fill the gaps.
Now when we have the #Trans control table, let's begin to shuffle the records and update auxiliary table(s).
/*******************************************************************************
Prepare source table for identity inserts
*******************************************************************************/
BEGIN TRAN
SET IDENTITY_INSERT #Sample ON
INSERT #Sample
(
ID,
data
)
SELECT t.idNew,
s.data
FROM #Trans AS t
INNER JOIN #Sample AS s ON s.ID = t.idOld
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR('Insert old records with new ID failed.', 18, 1)
RETURN
END
SET IDENTITY_INSERT #Sample OFF
UPDATE w
SET w.SampleID = t.idNew
FROM #Aux AS w
INNER JOIN #Trans AS t ON t.idOld = w.SampleID
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR('Update #Aux table failed.', 18, 1)
RETURN
END
DELETE s
FROM #Sample AS s
INNER JOIN #Trans AS t ON t.idOld = s.ID
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR('Delete old records failed.', 18, 1)
RETURN
END
COMMIT TRAN
|
And now we're set, right? Well, not quite right. Since the IDENTITY value for the base table is stored in the meta data for the table, we need to reset the current IDENTITY value so that we do not produce another gap at the end of IDENTITY sequence. You can do that like this
/*******************************************************************************
Cleanup
*******************************************************************************/
DECLARE @ID INT
SELECT @ID = MAX(ID)
FROM #Sample
DBCC CHECKIDENT(#Sample, RESEED, @ID)
SELECT *
FROM #Sample
SELECT *
FROM #Aux
DROP TABLE #Sample,
#Aux,
#Trans
|
I have tested this technique in a production system with several users online, and for a reasonable amount of gaps you can run this piece of code during normal operation. But remember the base table will be locked for a small amount of time and will slow down other processes accessing the base table. The best practice is to run this algorithm off-hours.
Good luck!
The complete demonstration code looks like this
Legacy Comments
remote DBA
2009-03-10 |
re: How to efficiently reuse gaps in identity column The most interesting part of your article is GO 10 :) I did not about this simplest way to iterate batches and have always been writing annoying while cycles. Thank you, Vadym. |
Peso
2009-03-10 |
re: How to efficiently reuse gaps in identity column Books Online is your friend! |
Jerry Hung
2009-03-10 |
re: How to efficiently reuse gaps in identity column Very clever, I liked the bit with ROW_NUMBER() to find reverse-matching ID's Let's hope I don't need to use technique for a long time |
Stephen S Sibert
2009-03-13 |
re: How to efficiently reuse gaps in identity column Thanks for your article. I agree with Vadym in that I had never heard of the GO 10 command and wow! is it useful. I do not have a problem with gaps but find the thought process very valuable. Thanks again! |
DLL
2009-03-16 |
re: How to efficiently reuse gaps in identity column I have to say….thanks….this article is much more clear and efficient… |
game boy
2009-03-16 |
re: How to efficiently reuse gaps in identity column I have searched the net and I should say I have not come across an article like this which is so easy to understand and learn the concepts. cheers!! |
Anderson Porto
2010-11-09 |
re: How to efficiently reuse gaps in identity column But, what if, I'd like to iterat all records from start to end ??? How do I do this? |