Friday, November 11, 2011
This is the final blog for my PASS Summit 2011 series. Well okay, a mini-series, I guess.
On the last day of the conference, I attended Keith Elmore’ and Boris Baryshnikov’s (both from Microsoft) “Introducing the Microsoft SQL Server Code Named “Denali” Performance Dashboard Reports, Jeremiah Peschka’s (blog|twitter) “Rewrite your T-SQL for Great Good!”, and Kimberly Tripp’s (blog|twitter) “Isolated Disasters in VLDBs”.
Keith and Boris talked about the lifecycle of a session, figuring out the running time and the waiting time. They pointed out the transient nature of the reports. You could be drilling into it to uncover a problem, but the session may have ended by the time you’ve drilled all of the way down. Also, the reports are for troubleshooting live problems and not historical ones. You can use Management Data Warehouse for historical troubleshooting. The reports provide similar benefits to the Activity Monitor, however Activity Monitor doesn’t provide context sensitive drill through.
One thing I learned in Keith’s and Boris’ session was that the buffer cache hit ratio should really never be below 87% due to the read-ahead mechanism in SQL Server. When a page is read, it will read the entire extent. So for every page read, you get 7 more read. If you need any of those 7 extra pages, well they are already in cache.
I had a lot of fun in Jeremiah’s session about refactoring code plus I learned a lot. His slides were visually presented in a fun way, which just made for a more upbeat presentation. Jeremiah says that before you start refactoring, you should look at your system. Investigate missing or too many indexes, out-of-date statistics, and other areas that could be leading to your code running slow. He talked about code standards. He suggested using common abbreviations for aliases instead of one-letter aliases. I’m a big offender of one-letter aliases, but he makes a good point. He said that join order does not matter to the optimizer, but it does matter to those who have to read your code. Now let’s get into refactoring!
- Eliminate useless things – useless/unneeded joins and columns. If you don’t need it, get rid of it!
- Instead of using DISTINCT/JOIN, replace with EXISTS
- Simplify your conditions; use UNION or better yet UNION ALL instead of OR to avoid a scan and use indexes for each union query
- Branching logic – instead of IF this, IF that, and on and on…use dynamic SQL (sp_executesql, please!) or use a parameterized query in the application
- Correlated subqueries – YUCK! Replace with a join
- Eliminate repeated patterns
Last, but certainly not least, was Kimberly’s session. Kimberly is my favorite speaker. I attended her two-day pre-conference seminar at PASS Summit 2005 as well as a SQL Immersion Event last December. Did I mention she’s my favorite speaker? Okay, enough of that.
Kimberly’s session was packed with demos. I had seen some of it in the SQL Immersion Event, but it was very nice to get a refresher on these, especially since I’ve got a VLDB with some growing pains. One key takeaway from her session is the idea to use a log shipping solution with a load delay, such as 6, 8, or 24 hours behind the primary. In the case of say an accidentally dropped table in a VLDB, we could retrieve it from the secondary database rather than waiting an eternity for a restore to complete. Kimberly let us know that in SQL Server 2012 (it finally has a name!), online rebuilds are supported even if there are LOB columns in your table. This will simplify custom code that intelligently figures out if an online rebuild is possible.
There was actually one last time slot for sessions that day, but I had an airplane to catch and my kids to see!
Well we’re about a month past PASS Summit 2011, and yet I haven’t finished blogging my notes! Between work and home life, I haven’t been able to come up for air in a bit. Now on to my notes…
On Thursday of the PASS Summit 2011, I attended Klaus Aschenbrenner’s (blog|twitter) “Advanced SQL Server 2008 Troubleshooting”, Joe Webb’s (blog|twitter) “SQL Server Locking & Blocking Made Simple”, Kalen Delaney’s (blog|twitter) “What Happened? Exploring the Plan Cache”, and Paul Randal’s (blog|twitter) “More DBA Mythbusters”. I think my head grew two times in size from the Thursday sessions. Just WOW!
I took a ton of notes in Klaus' session. He took a deep dive into how to troubleshoot performance problems. Here is how he goes about solving a performance problem:
- Start by checking the wait stats DMV
- System health
- Memory issues
- I/O issues
I normally start with blocking and then hit the wait stats. Here’s the wait stat query (Paul Randal’s) that I use when working on a performance problem. He highlighted a few waits to be aware of such as WRITELOG (indicates IO subsystem problem), SOS_SCHEDULER_YIELD (indicates CPU problem), and PAGEIOLATCH_XX (indicates an IO subsystem problem or a buffer pool problem).
Regarding memory issues, Klaus recommended that as a bare minimum, one should set the “max server memory (MB)” in sp_configure to 2GB or 10% reserved for the OS (whichever comes first). This is just a starting point though!
Regarding I/O issues, Klaus talked about disk partition alignment, which can improve SQL I/O performance by up to 100%. You should use 64kb for NTFS cluster, and it’s automatic in Windows 2008 R2.
Joe’s locking and blocking presentation was a good session to really clear up the fog in my mind about locking. One takeaway that I had no idea could be done was that you can set a timeout in T-SQL code view LOCK_TIMEOUT. If you do this via the application, you should trap error 1222.
Kalen’s session went into execution plans. The minimum size of a plan is 24k. This adds up fast especially if you have a lot of plans that don’t get reused much. You can use sys.dm_exec_cached_plans to check how often a plan is being reused by checking the usecounts column. She said that we can use DBCC FLUSHPROCINDB to clear out the stored procedure cache for a specific database. I didn’t know we had this available, so this was great to hear. This will be less intrusive when an emergency comes up where I’ve needed to run DBCC FREEPROCCACHE.
Kalen said one should enable “optimize for ad hoc workloads” if you have an adhoc loc. This stores only a 300-byte stub of the first plan, and if it gets run again, it’ll store the whole thing. This helps with plan cache bloat.
I have a lot of systems that use prepared statements, and Kalen says we simulate those calls by using sp_executesql. Cool!
Paul did a series of posts last year to debunk various myths and misconceptions around SQL Server. He continues to debunk things via “DBA Mythbusters”. You can get a PDF of a bunch of these here. One of the myths he went over is the number of tempdb data files that you should have. Back in 2000, the recommendation was to have as many tempdb data files as there are CPU cores on your server. This no longer holds true due to the numerous cores we have on our servers. Paul says you should start out with 1/4 to 1/2 the number of cores and work your way up from there. BUT! Paul likes what Bob Ward (twitter) says on this topic:
- 8 or less cores –> set number of files equal to the number of cores
- Greater than 8 cores –> start with 8 files and increase in blocks of 4
One common myth out there is to set your MAXDOP to 1 for an OLTP workload with high CXPACKET waits. Instead of that, dig deeper first. Look for missing indexes, out-of-date statistics, increase the “cost threshold for parallelism” setting, and perhaps set MAXDOP at the query level.
Paul stressed that you should not plan a backup strategy but instead plan a restore strategy. What are your recoverability requirements? Once you know that, now plan out your backups.
As Paul always does, he talked about DBCC CHECKDB. He said how fabulous it is. I didn’t want to interrupt the presentation, so after his session had ended, I asked Paul about the need to run DBCC CHECKDB on your mirror systems. You could have data corruption occur at the mirror and not at the principal server. If you aren’t checking for data corruption on your mirror systems, you could be failing over to a corrupt database in the case of a disaster or even a planned failover. You can’t run DBCC CHECKDB against the mirrored database, but you can run it against a snapshot off the mirrored database.
Tuesday, October 18, 2011
I arrived in Seattle last Monday afternoon to attend PASS Summit 2011. I had really wanted to attend Gail Shaw’s (blog|twitter) and Grant Fritchey’s (blog|twitter) pre-conference seminar “All About Execution Plans” on Monday, but that would have meant flying out on Sunday which I couldn’t do.
On Tuesday, I attended Allan Hirt’s (blog|twitter) pre-conference seminar entitled “A Deep Dive into AlwaysOn: Failover Clustering and Availability Groups”. Allan is a great speaker, and his seminar was packed with demos and information about AlwaysOn in SQL Server 2012. Unfortunately, I have lost my notes from this seminar and the presentation materials are only available on the pre-con DVD. Hmpf!
On Wednesday, I attended Gail Shaw’s “Bad Plan! Sit!”, Andrew Kelly’s (blog|twitter) “SQL 2008 Query Statistics”, Dan Jones’ (blog|twitter) “Improving your PowerShell Productivity”, and Brent Ozar’s (blog|twitter) “BLITZ! The SQL – More One Hour SQL Server Takeovers”.
In Gail’s session, she went over how to fix bad plans and bad query patterns. Update your stale statistics!
- How to fix bad plans
- Use local variables – optimizer can’t sniff it, so it’ll optimize for “average” value
- Use RECOMPILE (at the query or stored procedure level) – CPU hit
- OPTIMIZE FOR hint – most common value you’ll pass
- How to fix bad query patterns
- Don’t use them – ha!
- Catch-all queries
- Use dynamic SQL
- OPTION (RECOMPILE)
- Multiple execution paths
- Split into multiple stored procedures
- OPTION (RECOMPILE)
- Modifying parameter values
- Use local variables
- Split into outer and inner procedure
- OPTION (RECOMPILE)
She also went into “last resort” and “very last resort” options, but those are risky unless you know what you are doing. For the average Joe, she wouldn’t recommend these. Examples are query hints and plan guides.
While I enjoyed Andrew’s session, I didn’t take any notes as it was familiar material. Andrew is a great speaker though, and I’d highly recommend attending his sessions in the future.
Next up was Dan’s PowerShell session. I need to look into profiles, manifests, function modules, and function import scripts more as I just didn’t quite grasp these concepts. I am attending a PowerShell training class at the end of November, so maybe that’ll help clear it up. I really enjoyed the Excel integration demo. It was very cool watching PowerShell build the spreadsheet in real-time. I must look into this more! On a side note, I am jealous of Dan’s hair. Fabulous hair!
Brent’s session showed us how to quickly gather information about a server that you will be taking over database administration duties for. He wrote a script to do a fast health check and then later wrapped it into a stored procedure, sp_Blitz. I can’t wait to use this at my work even on systems where I’ve been the primary DBA for years, maybe there’s something I’ve overlooked. We are using EPM to help standardize our environment and uncover problems, but sp_Blitz will definitely still help us out. He even provides a cloud-based update feature, sp_BlitzUpdate, for sp_Blitz so you don’t have to constantly update it when he makes a change. I think I’ll utilize his update code for some other challenges that we face at my work.
What an amazing week I had at PASS Summit 2011 in Seattle, WA! I hadn’t attended a PASS conference since September of 2005 when it was in Grapevine, Texas. It has grown so much since then. I am not sure how many people attended back then, but I’d guesstimate about 1500. They announced that at this year’s conference there were 4000 attendees. WOW!
Here are my favorite aspects of this conference:
- Networking! – Not only did I meet a lot of new people, but I also got to meet people in person that I’ve known on the Internet for years like Mladen Prajdić (blog|twitter) and Rob Volk (blog|twitter). I even met someone that I’d recently helped out in the SQLTeam forums.
- Learning – I took a lot of notes during the sessions I attended and plan on blogging very soon about them. It is amazing the amount of things you learn and the things that you unlearn. Yes I said unlearn. Some of the stuff that I thought I knew was either out-dated or just plain wrong.
- Fun, fun, fun – To say that this conference was fun would be an understatement. I had a blast!
- I attended the “Welcome Reception and Quizbowl” on Tuesday night, the “Exhibitor Reception” on Wednesday night, and the “Community Appreciation Party” at GameWorks on Thursday night. There were many other after-hours events to attend, but I had to make my kids a priority at night so I had to get back to my hotel room before 9pm so that I could Skype with them.
- It was very entertaining reading and posting with #sqlpass on Twitter. Twitter has changed the conference experience for the better.
I will definitely be able to do my job better due to attending this conference. The return on investment is HUGE!
Wednesday, August 24, 2011
Today I renamed the sa account on 23 SQL Server 2005/2008 instances. I used the CMS to assist with this task.
Later we realized all of the SQL Agent jobs were failing on these instances with the following error: “The job failed. The owner (sa) of job XYZ does not have server access.”
We use sa as the job owner, and it had correctly changed the owner to our new name as the sids had not changed. We were at first confused why the jobs were failing but then realized restarting the SQL Agent service might help. The restart corrected the problem.
If you plan on renaming your sa account (best practice), make sure you restart the SQL Agent service afterwards to avoid failing jobs. This is perhaps common knowledge, but it was something new learned by me today.
Wednesday, August 17, 2011
If you use RDP to remotely connect to your servers, you've probably encountered a clipboard issue where copy/paste stops working. A quick Google search on the problem indicates you can easily fix the problem by logging out/logging back in or killing/restarting rdpclip.exe on the remote server. Here's an article which covers this topic.
But what do you do when copy/paste is intermittent? It works one second, stops working for 5-30 seconds, and then on its own starts working again. This is what’s occurring in our new non-production environment. The DBA team is setting up 16 new physical servers and 5 new virtual machines. I haven’t found a server where this ISN’T happening. This intermittent copy/paste issue is driving me crazy!
Friday, June 17, 2011
It never occurred to me that the way SSMS handles tabs could be changed, and it’s just that the default settings suck. In this blog post, Brent Ozar shows us how to fix SSMS so that the tabs are actually usable and not annoying anymore.
I can’t love his post enough. It has really helped me become more efficient. I’m always flipping between tabs and can’t quickly find the one I need at some critical time, but now I can easily find it!
Monday, March 07, 2011
Twice a year, we move our production systems to our disaster recovery site. Last Saturday night was one of those days. There are about 50 SQL Server databases to be moved to the DR site, which is done via database mirroring. It takes only a few seconds to failover, but some databases have a bit more involved work such as setting up replication.
Everything went relatively smooth, but we encountered a weird bug on our most mission critical system. After everything was successfully failed over to the DR site, it was noticed that mirroring was in a suspended state on one of the databases. We thought we had run into a SQL Server 2005 bug that we had been encountering and were working with Microsoft on a fix. Microsoft did fix it in both SQL Server 2005 service pack 3 cumulative update package 13 and service pack 4 cumulative update package 2, however SP3 CU13 and SP4 both recently failed on this system so we were not patched yet with the bug fix.
As the suspended state was causing us issues with replication, we dropped mirroring. We then noticed we had 10MB of free disk space on the mount point where the principal’s data files are stored. I knew something went amiss as this system should have at least 150GB free on that mount point. I immediately checked the main database’s data file and was shocked to see an autgrowth size of 65536%. The data file autogrew right before mirroring went into the suspended state.
I didn’t have a lot of time to research if this autgrowth problem was a known SQL Server bug, so I deferred that research to today. A quick Google search yielded no results but emphasis on “quick”.
I checked our performance system, which was recently restored with a copy of the affected production database, and found the autogrowth setting to be 512MB. So this autogrowth bug was encountered sometime in the last two weeks. On February 26th, we had attempted to install SQL 2005 SP4 on production, however it had failed (PSS case open with Microsoft). I suspected that the SP4 failure was somehow related to this autgrowth bug although that turned out not to be the case.
I then tweeted (@TaraKizer) about this problem to see if the SQL Server community (#sqlhelp) had any insights. It seems several people have either heard of this bug or encountered it. Aaron Bertrand (blog|twitter) referred me to this Connect item.
Our affected database originated on SQL Server 2000 and was upgraded to SQL Server 2005 in 2007. Back on SQL Server 2000, we were using the default file growth setting which was a percentage. Sometime after the 2005 upgrade is when we changed it to 512MB. Our situation seemed to fit the bug Aaron referred to me, so now the question was whether Microsoft had fixed it yet.
I received a reply to my tweet from Amit Banerjee (twitter) that it had been fixed in SP3 CU1 (KB958004). My affected system is SP3 CU8, so I was initially confused why we had encountered the bug. Because I don’t read things fully, I had missed that there are additional steps you have to follow after applying the bug fix. Amit set me straight.
Although you can read this information in the KB article, I will also copy it here in case you are as lazy as me and miss the most important section of it (although if you are as lazy as me, you won’t have read this far down my blog post):
This hotfix will prevent only future occurrences of this problem. For example, if you restore a database from SQL Server 2000 to a SQL Server 2005 instance that contains this hotfix, this problem will not occur. However, if you already have a database that is affected by this problem, you must follow these steps to resolve this problem manually:
- Apply this hotfix.
- Set the file growth settings for the affected files to percentage settings, and then set the settings back to megabyte settings.
- Take the database offline, and then bring it back online.
- Verify that the values of the is_percent_growth column are correct in the sys.database_files system table and in the sys.master_files system table.
Tuesday, March 01, 2011
I've been thinking about submitting my "Performance Tuning With Traces" topic for SQL Saturday #73, but I think I’m burnt out on that topic. I’ve only presented it twice (#55|#47), and I am passionate about that subject, however I need something new.
If you have any suggestions, I’d love to hear from you. You can leave me a comment or send me a direct email via the Contact page.
Thursday, February 24, 2011
This past Saturday, I presented "Performance Tuning with Traces" at SQL Saturday #47 in Phoenix, Arizona. You can download my slide deck and supporting files here.
This is the same presentation that I did in September at SQL Saturday #55 in San Diego, however I focused less on my custom server-side trace tool and more on the steps that I take to troubleshoot a production performance problem which often includes server-side tracing.
If any of my blog readers attended the presentation, I'd love to hear your feedback. I'm specifically interested in hearing constructive criticism. Speaking in front of people is not something that comes naturally to me. I plan on presenting in the future, so feedback on how I can do a better job would be very helpful. My number one problem is I talk too fast!
Wednesday, December 15, 2010
We’ve been having some major performance issues with one of the applications that I support. The database is on SQL Server 2005 and is about 150GB in size. We’ve identified a couple of issues already on the database side.
The first issue is that some query (or maybe several queries) is getting a bad execution plan at some point in time during the day. When it occurs, database performance comes to a grinding halt. We know it’s a bad execution plan as running DBCC FREEPROCCACHE immediately resolves the problem system-wide. As we have not yet identified the problematic query, we’ve put a temporary solution in place that frees the procedure cache on an hourly basis via a SQL Agent job. This is not ideal, but it is getting us through the day without a major problem. We are actively working on identifying the problematic query and hope to disable the SQL Agent job soon.
Earlier this week, we had a major slowdown for one of the processes of this application. I was unable to find any database performance issues, but I continued to investigate it. One of things that I typically do when investigating database performance issues is run the “Missing Indexes DMV Report” (that’s what I call it at least). When analyzing the output of that report, I immediately dismiss anything under 1 million “Impact” as I want to target the “low-hanging fruit” initially. When I ran the report earlier this week, I was shocked to find a suggested index with an impact of over 3 billion!
Do I win a prize for the highest impact? Has anyone seen a value higher than mine? My exact value was 3154284120.67765.
The performance issue from earlier this week ended up being an application problem, but it also brought to light a much needed index. I had previously seen this index come up in that report but always with a much lower impact. I had never considered it as the index’s selectivity is very low. It’s a composite index with three columns. The first column is not selective, the first two columns are not selective, and the three columns together are not selective. In fact, no matter how I order it, the index will not be selective at all. I briefly discussed this with Kimberly Tripp, and she said that this was okay for covering indexes. Selectivity is irrelevant for a covering index. She indicated that she’s even created indexes with gender as the first column in the index. I’ve got lots to learn still!
Monday, November 08, 2010
Recently I opened a case with Microsoft PSS to help us through a severe performance problem on a new system. As part of that case, the PSS engineer checked our “max degree of parallelism” server-side setting. It is our standard to use 4 on our production systems that have 16 CPUs (2 sockets, quad-core, hyper-threaded). The PSS engineer had me run the below query to get Microsoft’s recommended value of “max degree of parallelism” server-side setting for our 16-CPU system:
when cpu_count / hyperthread_ratio > 8 then 8
else cpu_count / hyperthread_ratio
end as optimal_maxdop_setting
The query returned 2. I made the change using sp_configure, and it did not resolve our issue. We have decided to leave it in place for now.
Do you agree with this query? What are your thoughts on this?
If you decide to change your setting to reflect the output of this query, please test it first to ensure there are no negative side effects.
Monday, October 18, 2010
SQL Saturday is coming to Phoenix, Arizona on February 19th, 2011. I have submitted my session for this event. If my session gets approved, it’ll be the same presentation that I gave recently at SQL Saturday #55. Although my recent presentation was well received, I will be reworking it to make it even better.
Hope to see you there!
Tuesday, October 12, 2010
I started my IT career as a student worker in the database team at the County of San Diego. Although I worked on many different things in that group, it launched my career as a Database Administrator. You can get an overview of my career here.
It seems I picked the right career for a good job in America. According to CNNMoney and PayScale, a database administrator job is ranked number 7 for the best job in America in 2010. Check it out here.
There are quite a few IT jobs on the list. You can check out the full list here.
Do you agree with this ranking?
Tuesday, September 21, 2010
On Saturday (9/18), I presented "Performance Tuning with Traces" at SQL Saturday #55. There were about 65 people in attendance, and it was standing room only.
As promised, you can download my presentation materials here.
I received such good feedback after my presentation that I will likely present again in the future if given the opportunity.