Anyone who has spent time in the computer business and has some grasp of hardware issues understands fragmentation. For those whose concept of hardware is limited to “it’s the part I can kick”, here is a quick and painless overview.
I like analogies so let’s think of a disk drive as a very large, tall building having many rooms and many floors. The disk head is you. Your task is to run around and “read” something from particular rooms (blocks). A simple thought exercise would suggest it is easier to read rooms “in order” on a floor rather than scattered around the building. You can “read” all the rooms on a hallway far faster than the same number of rooms on different floors or wings. Hence the difference between “sequential” and “random” I/O. Blocks or rooms, it all works the same. One trick used when benchmarking systems is to partition only part of each disk and assign that to a RAID set. This causes the heads to “short-stroke”, leading to faster seek times. This would be like only using the first ten floors of a large building so the elevator only has to move a short distance. Inefficient in space, but faster for throughput.
Fragmentation comes in two forms in SQL; internal and external. Internal fragmentation occurs when the various SQL data structures are not in order within a particular SQL data file. External fragmentation occurs when the SQL data file itself gets scattered in multiple sections over the physical disk. Internal fragmentation can be corrected by reindexing the database, something most of us have running automagically. External fragmentation is not so easy to fix. You have to defragment the disk with SQL Server offline so the data files sections can be moved around and strung back together. However, you can avoid External fragmentation very easily by following a few simple rules. Note that the issues for SQL Log files are a bit different but the same rules apply.
Rules to Avoid External Fragmentation:
1) Never Use Auto-Expand for data files. Note that I do not say disable auto-expand, just never actually use it. Pre-expand your databases in large increments to handle several months’ worth of growth at a time. Leave Auto-Growth as a “safety valve” to keep unexpected allocations from killing your server, but don’t rely on it to manage data file space allocations.
2) Never Shrink data files. Shrinking inevitable leads to later growth. Allocate once and leave it. Shrink also causes massive Internal fragmentation in a SQL database. The shrink algorithm moves data to new locations but does not attempt to keep data in contiguous segments. Auto-Shrink lets you do this on a regularly scheduled basis. How helpful.
3) Keep SQL disks dedicated to SQL Data only. Creating and optionally destroying many data files causes physical fragmentation on the disks. To go back to the building analogy, most disks allocate new from new. That is instead of reusing old rooms freed up from deleted files, the file system simply grabs the next unused room (or rooms) after the last used one and keep moving. This is nice until the first pass through the building is complete and stuff is all over the place.
Fragmentation brings us back to a fundamental dichotomy in SQL Server management. You can optimize for low space utilization or high performance but not both. Considering disk space is relatively cheap, I know which one I always go for.