Chris Miller Blog

RocketScientist's Miscellaneous Ramblings

Clustered Indexes and You

So, Dave and I were having a conversation about clustered indexes over lunch yesterday.

Yeah, we're this boring.

Anyway, I have an issue at work where I have a table that has a horrible clustered index.  It's not very selective (the first key can return upwards of 80K rows) and the first key changes frequently when the row is updated (leading to fragmentation).  It's great for one semi-frequent query in our system, but really isn't useful for most others.  The problem is that SQL Server (talking SQL Server 2000 here) uses the index A LOT in the form of clustered index scans.  To the point of ignoring every other index on the table.  If I go in and search anything but the PK, it will ignore every other index in the table and use the clustered index.   Period.  No discussion.  I've gone to using index hints to force it to use the correct index for the query, which actually improves the search by a factor of 10 to 100 most times.

The problem is that SQL Server assumes that a Bookmark Lookup step has a near-infinite cost.  So it will do anything, including the equivalent of a full table scan, to avoid that bookmark lookup.  Which is just insane in most cases.

The table also has this nasty trigger on it that basically serializes any rowset insert, updates, or deletes.  There are good reasons for this, but it still sucks.  There are no set-based operations on any data modifications in this table.  As a result, data modifications take a very long itme.

So then the discussion took a turn.

"Why do you have a clustered index".

Well, because you have to.

"No, you don't"

hmm.  No clustered index.  No bookmark lookups.  No completely ludicrous query plans to avoid the bookmark lookup.  No tablespace fragmentation.  Horrible insert performance, but it's not gonna get any worse than it already is with that damn trigger on there anyway.  Every index on the table stand on it's own merits, equally likely to be chosen based on its selectivity, not on some artificial and horrible decision made by the query processor to avoid a bookmark lookup that would likely cost nothing anyway.

Hmm.

If this was true, we'd see behavior where if you have a table with a clustered index, pretty much any non-clustered index that isn't a fully covering index is next to useless.  Which is generally the observed behavior.

So the choices are: Clustered index and every other index is covering for a particular query, or no clustered index and let the other indexes fight it out for selectivity for each query.

The nonclustered index approach looks to be more flexible in this case.  The cost, of course, is insert performance. Which is what I'll be testing today.






Legacy Comments


Ryan Hunt
2006-10-23
re: Clustered Indexes and You
Maybe I'm misreading your here, but your comment, "hmm. No clustered index. No bookmark lookups." doesn't quite work out. Heaps servicing queries with non-covering NC indexes are still going to perform Bookmark Loopups on the Heap. In fact, you might be more likely to us BML's since you have lost your Clustered Index Seeks (you cannot seek a heap). Also, think hard about Foward-Lookup pointers (which are heaps' own special little form of fragmentation) and how expensive they can get.

I went through this thought process a few months back - the idea of shying away from Clustering just because - and found that omitting them only seemed really beneficial in a few instances.

For example, If you're building an app that will primarly:
- NOT utilize range queries
- the DB will NOT use many variable data types (updates aren't likely to move the record)
- MOST queries can actually be pre-defined and serviced with a covering index.
-most selects are singleton, using covering, etc.

...then there was some benefit to losing the clustered index. Otherwise, I still pretty much like clustered indexes. RH

Dave Markle
2006-10-24
re: Clustered Indexes and You
Well, SQL Server 2000 can be confusing with the term "Bookmark Lookup". This can mean two different things -- either the row was "looked up" via the clustering key (which is what it will do in your case) or it was "looked up" using the RID on the nonclustered indexes. (BTW, in SQL Server 2005, your plans are more specific than this -- it tells you which type of lookup is going on.)

As with most things, it's not an issue of whether or not clustered indexes are a "bad" thing. How big is your clustering key? Is it unique? You're right, though. It definitely is a problem if you've got a frequently changing component of the clustered index, and it will lead to pain.

You don't necessarily need the table to be without a clustered index, though. You may just want to change it to a small set of columns which don't change much. USUALLY, this is your primary key, but it may not be. I don't know what your table design is, so I can't say. Another thing to look at is using SQL Server 2005 instead, and using their "included columns" feature of SQL 2005 indexes. This will allow you to basically create a covering index without having to reorganize the included columns when modifications are made.

Salman
2006-10-25
re: Clustered Indexes and You
Try adding hints to your query?