Twice a year, we move our production systems to our disaster recovery site. Last Saturday night was one of those days. There are about 50 SQL Server databases to be moved to the DR site, which is done via database mirroring. It takes only a few seconds to failover, but some databases have a bit more involved work such as setting up replication.
Everything went relatively smooth, but we encountered a weird bug on our most mission critical system. After everything was successfully failed over to the DR site, it was noticed that mirroring was in a suspended state on one of the databases. We thought we had run into a SQL Server 2005 bug that we had been encountering and were working with Microsoft on a fix. Microsoft did fix it in both SQL Server 2005 service pack 3 cumulative update package 13 and service pack 4 cumulative update package 2, however SP3 CU13 and SP4 both recently failed on this system so we were not patched yet with the bug fix.
As the suspended state was causing us issues with replication, we dropped mirroring. We then noticed we had 10MB of free disk space on the mount point where the principal’s data files are stored. I knew something went amiss as this system should have at least 150GB free on that mount point. I immediately checked the main database’s data file and was shocked to see an autgrowth size of 65536%. The data file autogrew right before mirroring went into the suspended state.
I didn’t have a lot of time to research if this autgrowth problem was a known SQL Server bug, so I deferred that research to today. A quick Google search yielded no results but emphasis on “quick”.
I checked our performance system, which was recently restored with a copy of the affected production database, and found the autogrowth setting to be 512MB. So this autogrowth bug was encountered sometime in the last two weeks. On February 26th, we had attempted to install SQL 2005 SP4 on production, however it had failed (PSS case open with Microsoft). I suspected that the SP4 failure was somehow related to this autgrowth bug although that turned out not to be the case.
I then tweeted (@TaraKizer) about this problem to see if the SQL Server community (#sqlhelp) had any insights. It seems several people have either heard of this bug or encountered it. Aaron Bertrand (blog|twitter) referred me to this Connect item.
Our affected database originated on SQL Server 2000 and was upgraded to SQL Server 2005 in 2007. Back on SQL Server 2000, we were using the default file growth setting which was a percentage. Sometime after the 2005 upgrade is when we changed it to 512MB. Our situation seemed to fit the bug Aaron referred to me, so now the question was whether Microsoft had fixed it yet.
I received a reply to my tweet from Amit Banerjee (twitter) that it had been fixed in SP3 CU1 (KB958004). My affected system is SP3 CU8, so I was initially confused why we had encountered the bug. Because I don’t read things fully, I had missed that there are additional steps you have to follow after applying the bug fix. Amit set me straight.
Although you can read this information in the KB article, I will also copy it here in case you are as lazy as me and miss the most important section of it (although if you are as lazy as me, you won’t have read this far down my blog post):
This hotfix will prevent only future occurrences of this problem. For example, if you restore a database from SQL Server 2000 to a SQL Server 2005 instance that contains this hotfix, this problem will not occur. However, if you already have a database that is affected by this problem, you must follow these steps to resolve this problem manually:
- Apply this hotfix.
- Set the file growth settings for the affected files to percentage settings, and then set the settings back to megabyte settings.
- Take the database offline, and then bring it back online.
- Verify that the values of the is_percent_growth column are correct in the sys.database_files system table and in the sys.master_files system table.