Just to stir stuff up a bit

Evaluate and comment on the following general statement:

In a well designed database, the clustered index should never be on the primary key.

The point being that the normal use case isn't "I know the primary key and need to find the rest of the columns", but instead "I know data in 3 columns, show me all the records that match so I can find related data using the PK elsewhere in the database".

I think that this is the single biggest performance blunder done by programmers and n00b dba's. It's also incredibly inconvenient to fix, requiring a lot of locking and basically 2 complete table reorgs (someday someone will explain why dropping a clustered index is such a hard operation, but please not in this post).

The only case I can think of where it's always false is a table where you have a surrogate key (usually an identity column of some kind) providing the PK, where the surrogate key is provided to customers (i.e. an invoice number or confirmation number).

I'll question the validity of your business requiring customers to remember invoice numbers and the impact on your perceived customer service later.

rs.

Print | posted on Wednesday, December 07, 2005 9:17 AM

Feedback


# re: Just to stir stuff up a bit 12/7/2005 11:01 AM Brian

I'll agree. I always love it when those n00b dba's click the "cluster index" checkbox everytime they create an index. HA!


# And your post's URL is...? 12/7/2005 1:59 PM Xaprb

Your post's URL is .../blah/.../8500.aspx. I question the validity of you requiring me to remember this URL ;-)


# re: Just to stir stuff up a bit 12/7/2005 5:15 PM Scott

Here we go agian, using the word "Never". It all depends on the model and its purpose. I will agree that surragate keys should to be exposed to the outside world is a good idea in general. That is why we have alternate keys. The problem is, that in a highly normalized db it is not uncommon to have a number of tables involved in a single query. I am currently working on an database in which one query uses seven joins to produce a result set. All but one of the tables in this query have clustered indexes on the primary key. The remaining table (approx. 20 million rows) has a clustered index on three columns to satisfy a covered query. Performance on this query is measured in milliseconds. You need to consider what you are trying to accomplish and build accordingly.


# re: Just to stir stuff up a bit 12/8/2005 4:24 PM DavidM

I take it that most of your PK's are identities then.. It's the only excuse for this terrible generalisation.


# re: Just to stir stuff up a bit 12/10/2005 3:41 PM Brett

Hmmmm...IDENTITY...CLUSTERED INDEX....HOT SPOTS...

You go girl


# re: Just to stir stuff up a bit 12/18/2005 3:19 PM tim

"Your post's URL is .../blah/.../8500.aspx. I question the validity of you requiring me to remember this URL ;-) "

Oh that is priceless. Touche.


# re: Just to stir stuff up a bit 12/18/2005 9:39 PM Adam Machanic

Clustering on the PK solves BOTH types of queries. You can efficiently look up attributes by PK, and if you create non-clustered indexes on those other columns you can efficiently find PKs based on other attributes.

As for 'n00bs', the main problem is web sites and books that tell them absolute rules to follow, causing them to generalize too much instead of learning to carefully research before making important decisions.


# re: Just to stir stuff up a bit 2/7/2006 2:33 PM penis enlargement pills

gg


# re: Just to stir stuff up a bit 2/15/2006 9:21 AM max

I think increasing the RAM would help anyway


# re: Just to stir stuff up a bit 3/27/2006 6:35 AM mp3 blog

nice to hear you again dude.


# re: Just to stir stuff up a bit 7/7/2006 5:28 PM xnxx

I also think,that increasing the RAM would help anyway.


# re: Just to stir stuff up a bit 10/16/2006 8:57 AM accupril

Hello!! Read

Comments have been closed on this topic.