Just to stir stuff up a bit

Wed Dec 7, 2005 by Chris Miller

Evaluate and comment on the following general statement:

In a well designed database, the clustered index should never be on the primary key.

The point being that the normal use case isn't "I know the primary key and need to find the rest of the columns", but instead "I know data in 3 columns, show me all the records that match so I can find related data using the PK elsewhere in the database".

I think that this is the single biggest performance blunder done by programmers and n00b dba's. It's also incredibly inconvenient to fix, requiring a lot of locking and basically 2 complete table reorgs (someday someone will explain why dropping a clustered index is such a hard operation, but please not in this post).

The only case I can think of where it's always false is a table where you have a surrogate key (usually an identity column of some kind) providing the PK, where the surrogate key is provided to customers (i.e. an invoice number or confirmation number).

I'll question the validity of your business requiring customers to remember invoice numbers and the impact on your perceived customer service later.

rs.

Legacy Comments


Brian 2005-12-07	re: Just to stir stuff up a bit I'll agree. I always love it when those n00b dba's click the "cluster index" checkbox everytime they create an index. HA!

Xaprb 2005-12-07	And your post's URL is...? Your post's URL is .../blah/.../8500.aspx. I question the validity of you requiring me to remember this URL ;-)

Scott 2005-12-07	re: Just to stir stuff up a bit Here we go agian, using the word "Never". It all depends on the model and its purpose. I will agree that surragate keys should to be exposed to the outside world is a good idea in general. That is why we have alternate keys. The problem is, that in a highly normalized db it is not uncommon to have a number of tables involved in a single query. I am currently working on an database in which one query uses seven joins to produce a result set. All but one of the tables in this query have clustered indexes on the primary key. The remaining table (approx. 20 million rows) has a clustered index on three columns to satisfy a covered query. Performance on this query is measured in milliseconds. You need to consider what you are trying to accomplish and build accordingly.

DavidM 2005-12-08	re: Just to stir stuff up a bit I take it that most of your PK's are identities then.. It's the only excuse for this terrible generalisation.

Brett 2005-12-10	re: Just to stir stuff up a bit Hmmmm...IDENTITY...CLUSTERED INDEX....HOT SPOTS... You go girl

tim 2005-12-18	re: Just to stir stuff up a bit "Your post's URL is .../blah/.../8500.aspx. I question the validity of you requiring me to remember this URL ;-) " Oh that is priceless. Touche.

Adam Machanic 2005-12-18	re: Just to stir stuff up a bit Clustering on the PK solves BOTH types of queries. You can efficiently look up attributes by PK, and if you create non-clustered indexes on those other columns you can efficiently find PKs based on other attributes. As for 'n00bs', the main problem is web sites and books that tell them absolute rules to follow, causing them to generalize too much instead of learning to carefully research before making important decisions.

penis enlargement pills 2006-02-07	re: Just to stir stuff up a bit gg

max 2006-02-15	re: Just to stir stuff up a bit I think increasing the RAM would help anyway

mp3 blog 2006-03-27	re: Just to stir stuff up a bit nice to hear you again dude.

xnxx 2006-07-07	re: Just to stir stuff up a bit I also think,that increasing the RAM would help anyway.

accupril 2006-10-16	re: Just to stir stuff up a bit Hello!! Read