This is the final blog for my PASS Summit 2011 series. Well okay, a mini-series, I guess.
On the last day of the conference, I attended Keith Elmore’ and Boris Baryshnikov’s (both from Microsoft) “Introducing the Microsoft SQL Server Code Named “Denali” Performance Dashboard Reports, Jeremiah Peschka’s (blog|twitter) “Rewrite your T-SQL for Great Good!”, and Kimberly Tripp’s (blog|twitter) “Isolated Disasters in VLDBs”.
Keith and Boris talked about the lifecycle of a session, figuring out the running time and the waiting time. They pointed out the transient nature of the reports. You could be drilling into it to uncover a problem, but the session may have ended by the time you’ve drilled all of the way down. Also, the reports are for troubleshooting live problems and not historical ones. You can use Management Data Warehouse for historical troubleshooting. The reports provide similar benefits to the Activity Monitor, however Activity Monitor doesn’t provide context sensitive drill through.
One thing I learned in Keith’s and Boris’ session was that the buffer cache hit ratio should really never be below 87% due to the read-ahead mechanism in SQL Server. When a page is read, it will read the entire extent. So for every page read, you get 7 more read. If you need any of those 7 extra pages, well they are already in cache.
I had a lot of fun in Jeremiah’s session about refactoring code plus I learned a lot. His slides were visually presented in a fun way, which just made for a more upbeat presentation. Jeremiah says that before you start refactoring, you should look at your system. Investigate missing or too many indexes, out-of-date statistics, and other areas that could be leading to your code running slow. He talked about code standards. He suggested using common abbreviations for aliases instead of one-letter aliases. I’m a big offender of one-letter aliases, but he makes a good point. He said that join order does not matter to the optimizer, but it does matter to those who have to read your code. Now let’s get into refactoring!
- Eliminate useless things – useless/unneeded joins and columns. If you don’t need it, get rid of it!
- Instead of using DISTINCT/JOIN, replace with EXISTS
- Simplify your conditions; use UNION or better yet UNION ALL instead of OR to avoid a scan and use indexes for each union query
- Branching logic – instead of IF this, IF that, and on and on…use dynamic SQL (sp_executesql, please!) or use a parameterized query in the application
- Correlated subqueries – YUCK! Replace with a join
- Eliminate repeated patterns
Last, but certainly not least, was Kimberly’s session. Kimberly is my favorite speaker. I attended her two-day pre-conference seminar at PASS Summit 2005 as well as a SQL Immersion Event last December. Did I mention she’s my favorite speaker? Okay, enough of that.
Kimberly’s session was packed with demos. I had seen some of it in the SQL Immersion Event, but it was very nice to get a refresher on these, especially since I’ve got a VLDB with some growing pains. One key takeaway from her session is the idea to use a log shipping solution with a load delay, such as 6, 8, or 24 hours behind the primary. In the case of say an accidentally dropped table in a VLDB, we could retrieve it from the secondary database rather than waiting an eternity for a restore to complete. Kimberly let us know that in SQL Server 2012 (it finally has a name!), online rebuilds are supported even if there are LOB columns in your table. This will simplify custom code that intelligently figures out if an online rebuild is possible.
There was actually one last time slot for sessions that day, but I had an airplane to catch and my kids to see!
Well we’re about a month past PASS Summit 2011, and yet I haven’t finished blogging my notes! Between work and home life, I haven’t been able to come up for air in a bit. Now on to my notes…
On Thursday of the PASS Summit 2011, I attended Klaus Aschenbrenner’s (blog|twitter) “Advanced SQL Server 2008 Troubleshooting”, Joe Webb’s (blog|twitter) “SQL Server Locking & Blocking Made Simple”, Kalen Delaney’s (blog|twitter) “What Happened? Exploring the Plan Cache”, and Paul Randal’s (blog|twitter) “More DBA Mythbusters”. I think my head grew two times in size from the Thursday sessions. Just WOW!
I took a ton of notes in Klaus' session. He took a deep dive into how to troubleshoot performance problems. Here is how he goes about solving a performance problem:
- Start by checking the wait stats DMV
- System health
- Memory issues
- I/O issues
I normally start with blocking and then hit the wait stats. Here’s the wait stat query (Paul Randal’s) that I use when working on a performance problem. He highlighted a few waits to be aware of such as WRITELOG (indicates IO subsystem problem), SOS_SCHEDULER_YIELD (indicates CPU problem), and PAGEIOLATCH_XX (indicates an IO subsystem problem or a buffer pool problem).
Regarding memory issues, Klaus recommended that as a bare minimum, one should set the “max server memory (MB)” in sp_configure to 2GB or 10% reserved for the OS (whichever comes first). This is just a starting point though!
Regarding I/O issues, Klaus talked about disk partition alignment, which can improve SQL I/O performance by up to 100%. You should use 64kb for NTFS cluster, and it’s automatic in Windows 2008 R2.
Joe’s locking and blocking presentation was a good session to really clear up the fog in my mind about locking. One takeaway that I had no idea could be done was that you can set a timeout in T-SQL code view LOCK_TIMEOUT. If you do this via the application, you should trap error 1222.
Kalen’s session went into execution plans. The minimum size of a plan is 24k. This adds up fast especially if you have a lot of plans that don’t get reused much. You can use sys.dm_exec_cached_plans to check how often a plan is being reused by checking the usecounts column. She said that we can use DBCC FLUSHPROCINDB to clear out the stored procedure cache for a specific database. I didn’t know we had this available, so this was great to hear. This will be less intrusive when an emergency comes up where I’ve needed to run DBCC FREEPROCCACHE.
Kalen said one should enable “optimize for ad hoc workloads” if you have an adhoc loc. This stores only a 300-byte stub of the first plan, and if it gets run again, it’ll store the whole thing. This helps with plan cache bloat.
I have a lot of systems that use prepared statements, and Kalen says we simulate those calls by using sp_executesql. Cool!
Paul did a series of posts last year to debunk various myths and misconceptions around SQL Server. He continues to debunk things via “DBA Mythbusters”. You can get a PDF of a bunch of these here. One of the myths he went over is the number of tempdb data files that you should have. Back in 2000, the recommendation was to have as many tempdb data files as there are CPU cores on your server. This no longer holds true due to the numerous cores we have on our servers. Paul says you should start out with 1/4 to 1/2 the number of cores and work your way up from there. BUT! Paul likes what Bob Ward (twitter) says on this topic:
- 8 or less cores –> set number of files equal to the number of cores
- Greater than 8 cores –> start with 8 files and increase in blocks of 4
One common myth out there is to set your MAXDOP to 1 for an OLTP workload with high CXPACKET waits. Instead of that, dig deeper first. Look for missing indexes, out-of-date statistics, increase the “cost threshold for parallelism” setting, and perhaps set MAXDOP at the query level.
Paul stressed that you should not plan a backup strategy but instead plan a restore strategy. What are your recoverability requirements? Once you know that, now plan out your backups.
As Paul always does, he talked about DBCC CHECKDB. He said how fabulous it is. I didn’t want to interrupt the presentation, so after his session had ended, I asked Paul about the need to run DBCC CHECKDB on your mirror systems. You could have data corruption occur at the mirror and not at the principal server. If you aren’t checking for data corruption on your mirror systems, you could be failing over to a corrupt database in the case of a disaster or even a planned failover. You can’t run DBCC CHECKDB against the mirrored database, but you can run it against a snapshot off the mirrored database.