Secrets of Foreign Key Index Binding

You might be surprised to learn that foreign keys bind to physical indexes when they are created. Furthermore, a foreign key does not necessarily bind to the primary key index of the referenced table; SQL Server allows a foreign key to refer to any column(s) that are guaranteed to be unique as enforced by a primary key constraint, unique constraint or unique index.

In this post, I’ll discuss the undocumented rules SQL Server uses to bind foreign key constraints to referenced table indexes so that you can achieve performance goals and protect yourself against unexpected errors in DDL modification scripts.

Background

Typically, one references the primary key in foreign key relationships. I’ve seen a foreign key (deliberately) reference columns other than the primary key only a couple of times in my career. The foreign key referenced an alternate key with a unique constraint in those cases. Why one would create such a relationship is an exercise for the reader. I’ll focus on the primary key here, although the same considerations apply to foreign keys referencing alternate keys.

As I mentioned earlier, SQL Server binds a foreign key to a physical unique index. This binding performance implications because it determines the index SQL Server uses to enforce referential integrity as child table rows are inserted or updated. Also, SQL Server will not allow the index bound to a foreign key to be dropped since that could allow duplicate rows in the parent table and thus break the unique side of the relationship. This must be considered when developing scripts that drop unique indexes (including primary key and unique constraints) that may be bound to foreign keys.

A foreign key referencing the primary key will always be bound to the primary key index when that is the only unique index on the foreign key column(s). However, you might have additional unique indexes on the primary key column(s) for performance reasons. For example, consider the case of a clustered primary key. Performance of a frequently executed query may be improved with a covering non-clustered index:





--create parent table

CREATE TABLE dbo.ParentTable(

       ParentTableID int NOT NULL IDENTITY

              CONSTRAINT PK_ParentTable PRIMARY KEY CLUSTERED

       ,Column1 int NOT NULL

       ,Column2 varchar(100) NOT NULL

       );

GO

 

--create a non-clustered covering index

CREATE UNIQUE NONCLUSTERED INDEX idx_ParentTable_ParentTableID

       ON dbo.ParentTable(ParentTableID) INCLUDE(Column1);

GO

 

INSERT INTO dbo.ParentTable VALUES(1, 'some data');

INSERT INTO dbo.ParentTable VALUES(2, 'some data');

INSERT INTO dbo.ParentTable VALUES(3, 'some data');

GO

 

--create child table

CREATE TABLE dbo.ChildTable(

       ChildTableID int NOT NULL IDENTITY

            CONSTRAINT PK_ChildTable PRIMARY KEY CLUSTERED

       ,ParentTableID int NOT NULL

              CONSTRAINT FK_ChildTable_ParentTable

                     FOREIGN KEY REFERENCES dbo.ParentTable(ParentTableID)

       );

GO

 

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

GO

 

UPDATE STATISTICS dbo.ParentTable;

UPDATE STATISTICS dbo.ChildTable;

GO

 

--show the foreign key index binding

SELECT

    fki.name

FROM sys.foreign_keys AS f

JOIN sys.indexes AS fki ON

      fki.object_id = f.referenced_object_id     

      AND fki.index_id = f.key_index_id

WHERE

      f.object_id = OBJECT_ID(N'dbo.FK_ChildTable_ParentTable');

GO

 

--this query uses the covering index instead of clustered PK index

SELECT p.ParentTableID, p.Column1

FROM dbo.ParentTable AS p

WHERE p.ParentTableID IN(1,2,3);

GO

 

 

The SELECT query in the above script uses the covering idx_ParentTable_ParentTableID index. While this is good for performance, it introduces ambiguity regarding index binding to the foreign key. Again, any primary key constraint, unique constraint or index on the referenced column(s) may be referenced by a foreign key. With two candidate unique indexes (PK_ParentTable and idx_ParentTable_ParentTableID), you have little control which index is bound to the foreign key. 

SQL Server chooses the index binding based on rules that vary by version so you will get different binding depending on your version of SQLServer. SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, the first (lowest index_id) unique non-clustered index on the referenced column(s) is used. The sample script above binds the foreign key to the PK_WideTable index under SQL Server 2005 because it is the clustered index, not because it is the primary key.

In later versions (SQL 2008, SQL 2008R2 and SQL 2012), the foreign key is bound to the unique non-clustered index on the referenced column(s) with the lowest index_id when possible. Only when no suitable unique non-clustered index exists is the unique clustered index chosen. So the foreign key in the above script is bound to idx_ParentTable_ParentTableID in SQL 2008 and later versions instead of the primary key index as one might expect.

Why Foreign Key Index Binding is Important

There are two reasons why it is important to control the index bound to a foreign key. One is performance. As I mentioned earlier, the index bound to the foreign key constraint is used at execution time to enforce the constraint as child table rows are inserted or the foreign key column(s) updated. If the parent table is large and not queried often but rows are inserted into the child table heavily, a unique non-clustered index that “covers” the referential integrity check may be more desirable than the clustered index. This can improve buffer efficiency and page life expectancy compared to using a clustered index (e.g. primary key). My assumption is that this is why SQL Server 2008 and later versions prefer the unique non-clustered index over the clustered index for constraint enforcement.

Another reason one should control the index bound to the foreign key is to facilitate index changes. If you try to drop an index bound to a foreign key, you’ll get an error like “An explicit DROP INDEX is not allowed on index 'dbo.ParentTable.idx_ParentTable_ParentTableID '. It is being used for FOREIGN KEY constraint enforcement.” You’ll need to drop the foreign key first and recreate after dropping the index.

Since one can’t specify the bound foreign key index declaratively, the only guaranteed way to control the binding is to create the foreign key when only the desired unique index exists and create additional indexes afterward. This isn’t to say you can’t rely on the rules described earlier but you need to be aware that such rules vary depending on the SQL Server version and could change in the future.