Well we’re about a month past PASS Summit 2011, and yet I haven’t finished blogging my notes! Between work and home life, I haven’t been able to come up for air in a bit. Now on to my notes…
On Thursday of the PASS Summit 2011, I attended Klaus Aschenbrenner’s (blog|twitter) “Advanced SQL Server 2008 Troubleshooting”, Joe Webb’s (blog|twitter) “SQL Server Locking & Blocking Made Simple”, Kalen Delaney’s (blog|twitter) “What Happened? Exploring the Plan Cache”, and Paul Randal’s (blog|twitter) “More DBA Mythbusters”. I think my head grew two times in size from the Thursday sessions. Just WOW!
I took a ton of notes in Klaus' session. He took a deep dive into how to troubleshoot performance problems. Here is how he goes about solving a performance problem:
- Start by checking the wait stats DMV
- System health
- Memory issues
- I/O issues
I normally start with blocking and then hit the wait stats. Here’s the wait stat query (Paul Randal’s) that I use when working on a performance problem. He highlighted a few waits to be aware of such as WRITELOG (indicates IO subsystem problem), SOS_SCHEDULER_YIELD (indicates CPU problem), and PAGEIOLATCH_XX (indicates an IO subsystem problem or a buffer pool problem).
Regarding memory issues, Klaus recommended that as a bare minimum, one should set the “max server memory (MB)” in sp_configure to 2GB or 10% reserved for the OS (whichever comes first). This is just a starting point though!
Regarding I/O issues, Klaus talked about disk partition alignment, which can improve SQL I/O performance by up to 100%. You should use 64kb for NTFS cluster, and it’s automatic in Windows 2008 R2.
Joe’s locking and blocking presentation was a good session to really clear up the fog in my mind about locking. One takeaway that I had no idea could be done was that you can set a timeout in T-SQL code view LOCK_TIMEOUT. If you do this via the application, you should trap error 1222.
Kalen’s session went into execution plans. The minimum size of a plan is 24k. This adds up fast especially if you have a lot of plans that don’t get reused much. You can use sys.dm_exec_cached_plans to check how often a plan is being reused by checking the usecounts column. She said that we can use DBCC FLUSHPROCINDB to clear out the stored procedure cache for a specific database. I didn’t know we had this available, so this was great to hear. This will be less intrusive when an emergency comes up where I’ve needed to run DBCC FREEPROCCACHE.
Kalen said one should enable “optimize for ad hoc workloads” if you have an adhoc loc. This stores only a 300-byte stub of the first plan, and if it gets run again, it’ll store the whole thing. This helps with plan cache bloat.
I have a lot of systems that use prepared statements, and Kalen says we simulate those calls by using sp_executesql. Cool!
Paul did a series of posts last year to debunk various myths and misconceptions around SQL Server. He continues to debunk things via “DBA Mythbusters”. You can get a PDF of a bunch of these here. One of the myths he went over is the number of tempdb data files that you should have. Back in 2000, the recommendation was to have as many tempdb data files as there are CPU cores on your server. This no longer holds true due to the numerous cores we have on our servers. Paul says you should start out with 1/4 to 1/2 the number of cores and work your way up from there. BUT! Paul likes what Bob Ward (twitter) says on this topic:
- 8 or less cores –> set number of files equal to the number of cores
- Greater than 8 cores –> start with 8 files and increase in blocks of 4
One common myth out there is to set your MAXDOP to 1 for an OLTP workload with high CXPACKET waits. Instead of that, dig deeper first. Look for missing indexes, out-of-date statistics, increase the “cost threshold for parallelism” setting, and perhaps set MAXDOP at the query level.
Paul stressed that you should not plan a backup strategy but instead plan a restore strategy. What are your recoverability requirements? Once you know that, now plan out your backups.
As Paul always does, he talked about DBCC CHECKDB. He said how fabulous it is. I didn’t want to interrupt the presentation, so after his session had ended, I asked Paul about the need to run DBCC CHECKDB on your mirror systems. You could have data corruption occur at the mirror and not at the principal server. If you aren’t checking for data corruption on your mirror systems, you could be failing over to a corrupt database in the case of a disaster or even a planned failover. You can’t run DBCC CHECKDB against the mirrored database, but you can run it against a snapshot off the mirrored database.