Geoff N. Hiten Blog

SQL Server thoughts, observations, and comments

Auto-Fragmenting Databases

Anyone who has spent time in the computer business and has some grasp of hardware issues understands fragmentation.  For those whose concept of hardware is limited to “it’s the part I can kick”, here is a quick and painless overview.

I like analogies   so let’s think of a disk drive as a very large, tall building having many rooms and many floors.  The disk head is you.  Your task is to run around and “read” something from particular rooms (blocks).   A simple thought exercise would suggest it is easier to read rooms “in order” on a floor rather than scattered around the building.  You can “read” all the rooms on a hallway far faster than the same number of rooms on different floors or wings.   Hence the difference between “sequential” and “random” I/O.  Blocks or rooms, it all works the same.  One trick used when benchmarking systems is to partition only part of each disk and assign that to a RAID set.  This causes the heads to “short-stroke”, leading to faster seek times.  This would be like only using the first ten floors of a large building so the elevator only has to move a short distance.  Inefficient in space, but faster for throughput.

Fragmentation comes in two forms in SQL; internal and external.  Internal fragmentation occurs when the various SQL data structures are not in order within a particular SQL data file.  External fragmentation occurs when the SQL data file itself gets scattered in multiple sections over the physical disk.   Internal fragmentation can be corrected by reindexing the database, something most of us have running automagically.  External fragmentation is not so easy to fix. You have to defragment the disk with SQL Server offline so the data files sections can be moved around and strung back together.  However, you can avoid External fragmentation very easily by following a few simple rules.  Note that the issues for SQL Log files are a bit different but the same rules apply.

Rules to Avoid External Fragmentation:

1)      Never Use Auto-Expand for data files.  Note that I do not say disable auto-expand, just never actually use it.  Pre-expand your databases in large increments to handle several months’ worth of growth at a time. Leave Auto-Growth as a “safety valve” to keep unexpected allocations from killing your server, but don’t rely on it to manage data file space allocations.

2)      Never Shrink data files.  Shrinking inevitable leads to later growth.  Allocate once and leave it.  Shrink also causes massive Internal fragmentation in a SQL database.  The shrink algorithm moves data to new locations but does not attempt to keep data in contiguous segments.  Auto-Shrink lets you do this on a regularly scheduled basis.  How helpful.

3)      Keep SQL disks dedicated to SQL Data only.  Creating and optionally destroying many data files causes physical fragmentation on the disks.  To go back to the building analogy, most disks allocate new from new.  That is instead of reusing old rooms freed up from deleted files, the file system simply grabs the next unused room (or rooms) after the last used one and keep moving.  This is nice until the first pass through the building is complete and stuff is all over the place. 

Fragmentation brings us back to a fundamental dichotomy in SQL Server management. You can optimize for low space utilization or high performance but not both.  Considering disk space is relatively cheap, I know which one I always go for.

Legacy Comments


Jerry Hung
2009-03-19
re: Auto-Fragmenting Databases
I like the cute analogy......... DBA running around to get stuff, sounds like real life too

Try to minimize use of NEW_ID() or GUID for PK/Clustered Index, that causes fragmentation

SQLCraftsman
2009-03-19
re: Auto-Fragmenting Databases
Personally, I think the whole GUID thing is overblown. Yes it causes fragmentation and page splits. Decent index maintenance and a good fill factor can mitigate that. Let the developers have their shiny toys and stay focused on providing solid data services.

Aymen
2009-03-19
re: Auto-Fragmenting Databases
Hello

I have DB size 50 Gb and my SGBD is MS SQL2005, my DB is in RECOVERY FULL mode.

my probleme is: the Log of my DB growth 6Gb in 3 days,

i do : backup log MyDB with no_log , after i do shrink file Log =====> but after 3 or 4 days my Log growth to 5 Gb

i made job (backup log MyDB with no_log) + shrink file for my Log

Information: i use BackupExec to do Full backup evry Sunday and i do Differentiel backup evry 1 hour
+ job to rebuild index evry wednesday

my question: My DB is normal or not? my SQL job is normal safety or not?

plz help me



SQLCraftsman
2009-03-19
re: Auto-Fragmenting Databases
Those numbers are not unexpected. Since you are throwing away the logs, you can set the database to SIMPLE recovery and let SQL handle that part automatically.

Remote DBA
2009-03-20
re: Auto-Fragmenting Databases
Hi Aymen.
You should revise your backup/restore strategy... Backing up transaction logs with no_log is meaningless. BTW this operation is denied in SQL Server 2008.
thanks, Vadym.

Tibor Karaszi
2009-03-23
re: Auto-Fragmenting Databases
Hi Geoff,

I like the analogy. There's one thing missing to the picture, though. I can imagine that as we more and more will run into SSD, the question will arise whether fragmentation becomes irrelevant. And the answer is "no", at least with current architecture of SQL Server. The reason is read-ahead. A thread runs before the worker thread and read extents into memory. The problem now is if the worker thread need only, say, one page from an extent and then because of fragmentation jump to some other extent. End result is that pages are read into memory which weren't needed in the first place. Essentially, you reduced your RAM by in worst case a factor of 8 because of reading a lot of junk into memory.

Perhaps we can expand on the analogy with a runner running in advance and putting all pieces of papers (assume you are after some papers in each room) in a pile in the doorway, but the runner is so stupid it can only do one floor (8 rooms) units.

Also, I advice some care around the terms "external" and "internal" fragmentation. A lot of SQL Server material refers to pages not in order as "external" and non full pages as "internal" fragmentation. I prefer your terminology, but that is often not the case.

Remote DBA
2009-03-23
re: Auto-Fragmenting Databases
Agree with SQLCraftsman..

SQLCraftsman
2009-03-23
re: Auto-Fragmenting Databases
Re: Tibor,

Good point on SSD storage. SQL is tuned for rotating media storage where physical movements are the main cause of disk slowdowns. Read-ahead is pretty meaningless on SSD except to save I/O overhead by reading larger chunks.