Lance Harra Blog

The need for a unique clustered index on almost every table.

I assume everyone understands the need for a unique clustered index. on almost every table.  Without a clustered index the table is just heap.  You have no easy way to cleanup fragementation without a clustered index. If you do not make the clustered index unique SQL Server creates a uniqueifier for you to make the index unique.  The old school of thought was to cluster for distribution to ease hot spotting on inserts.  Since row level locking has been implemented the need to distribute inserts has become secondary to minimizing index and extent fragmentation.  If  you use an identity column as your unique clustered index you can greatly reduce the amount of fragementation because all of your inserts will occur at the end of the table.  You need to experiment to find a good fill factor for you particular situatiuon to minimize page splits from updates.  In the sessions on indexing I attended it was suggested that you fill varchar fields to some average default length on insert to help reserve some space and miniminize page splits.  I know I am going to try implementing these suggestions in the next systems that I design and implement.  Many of these suggestions raise the old issues of natural or surrogate key for primary keys.  After the session I am now more convinced that I need to consider how sql server works more and how can I usew that to build high performance systems. 

Legacy Comments

re: The need for a unique clustered index on almost every table.
The clustered index should be the field(s) that you use to select ranges/groups of rows most often with. For example, if you do most searchs for records between dates 01 Feb 2003 and 01 Mar 2003, the date field should probably be your clustered index.

Adding a clustered index on an identity field may make inserts work ok, but any selects will lose all benefits. If you are inserting records 80% of the time, this may work. If you are doing a mix, this probably is not the best idea.

Also, finding a single record via a nonclustered index on a unique column (such as an identity column) is about as fast as on a clustered index. So don't fall into the trap of "I will be selected individual records by their ID, so I have to put the clustered index on that column."

These are all vague generalizations. It helps to read Kalen Delaney's articles on SQLMag and a book on SQL Server internals, at least an overview of how queries work in general. And finally, test to see which works faster regardless of what the book says!

Cancri (
re: The need for a unique clustered index on almost every table.
There's nothing worse than identity column. Read any Joe Celko's book.

re: The need for a unique clustered index on almost every table.
Identity columns that don't get copied are bad during destro. are evil! I feel I get better control of SQL creating IDs in the app.. Using newID() makes for easy PK inserts on disconnected systems (web).