We have a number of de facto standards in our shop, such as always placing data and indexes on separate filegroups. I asked why we bothered with the separate filegroups since the underlying files were often on the same physical disks anyway. No one really knew why. The practice had been followed before the current DBAs joined the organization and has continued.
This reminded me of a parable my boss told me several years ago. Five monkeys are placed in a cage with a step ladder leading to a bunch of bananas suspended from the top. The first monkey rushes up the ladder eagerly only to get forcibly knocked off by a stream of water from fire hose when he reaches the top. The other monkeys are also drenched with the ice cold water. After recovering from the shock, another monkey climbs the ladder with the same result. Eventually, each monkey tries to reach the bananas and learns the futility and unpleasantness of the effort. Any monkey that even tries to climb the ladder is promptly tackled by his peers.
Now attrition begins. One of the original monkeys is removed and replaced with a new monkey. He looks at the bananas and back at the other monkeys, wondering why they ignore such a tasty treat. He starts up the ladder and is immediately pulled off and beaten by the other monkeys. After a couple of attempts, the new monkey learns to avoid the ladder. Another original monkey is replaced with a new monkey. When he goes for the bananas, he also gets tackled by the other monkeys, including the one that’s never been soaked. The same process is repeated; original monkeys are replaced and new monkeys quickly learn to avoid the ladder despite the prize at the top. The last original monkey is eventually replaced.
We now have a group of monkeys who’ve never been soaked by the fire hose, won’t climb the ladder and won’t allow others to climb it either. It doesn’t matter if the fire hose is removed or how many generations of monkeys follow. They don’t know why they do what they do.
Like the monkeys, institutional memory
causes us to follow practices mindlessly and be slow to adapt to changes in our environment. Going back to my data/index filegroup question, there may have once been a good reason for the separate filegroups. Perhaps the separate groups improved performance of a specific application workload by isolating random and sequential I/O on local disk storage. Our current environment is quite different, however. Physical disks on the current SAN storage are often shared with other hosts/LUNs so placing data and indexes on different filegroups no longer isolates the I/O. Furthermore, the separate filegroup approach was since generalized to apply to all databases regardless of whether or not it made sense for a given application workload. We continue the data/index filegroup practice because the reasoning behind the separate filegroups is not known or understood.
It may be easier to robotically continue with the status quo but I think we need to understand exactly why a given filegroup approach might be advantageous so that we can choose what is best for the situation at hand. For example, I/O Performance of our high transaction OLTP databases is best achieved by balancing the heavy random I/O workload over as many disks as possible because this maximizes the disk transfers/sec rate. I don't think segregating data and index files is appropriate for our OLTP applications because it artificially caps the maximum I/O rate and can actually introduce an unnecessary I/O bottleneck.
In contrast, a single filegroup for both data and indexes distributes I/O proportionally according to the size of the underlying files with less complexity and administration than the separate filegroup approach. In my experience, this single user-defined filegroup strategy works well for both OLTP and mixed workloads where sequential I/O cannot be accurately predicted. Note that we need to work closely with the SAN storage team to meet performance objectives regardless of the filegroup strategy we choose since the physical implementation is outside our control.
We also have databases containing large amounts of historical data. In that case, we can use filegroups to place infrequently accessed older data on read-only filegroups with the files on relatively slow inexpensive storage. Often-queried recent data can then be stored on the more expensive fast storage devices via a read-write filegroup. The key here is that the additional maintenance and administration complexity is justified by the hardware savings. Similarly, our large reporting databases can benefit from a customized filegroup strategy to facilitate backup/recovery or improve performance by isolating random and sequential I/O for specialized (and predictable) workloads.
Like many of the choices we make, there is no single filegroup strategy that is optimal for all cases. It’s important to understand the rationale behind a given approach so that one can choose the best one for the current situation and create new strategies when needed. A key differentiator between average DBA and a good one is a thorough knowledge of why practices are followed. Don’t be a monkey.