Patches! We don’t need no stinkeen’ Patches!
Well, yes. Actually you do. Patching a SQL Server cluster sounds like a complex endeavor, but it is really a lot simpler than people think. Much of the confusion is due to SQL 2000 and SQL 2005 having slightly different behaviors when it comes to patching. Also, with Clustering now available in Standard Edition, more DBAs find themselves caring for the first cluster in their organization. “Learn as you go” is not nearly as fun as it sounds.
The first rule of patching is to test. Make sure the patches work in a test environment, both with patches for the OS and for SQL. Given the history of unstable SQL Service Packs, I have to suggest waiting for a few weeks after a SP is released to make sure nothing important got broken. The SQL team is very much aware of their past shortcomings in that area and has started to fix the entire SQL servicing model. It will be a while before we see the fruits of those efforts, so for the time being let someone else be the (insert politically correct ethnic minority here) mine detector.
Regardless of the SQL version, the OS will need patching from time to time. Balancing the need to keep a system secure from newly discovered vulnerabilities with the desire to have a stable platform is tricky. You want to stay up-to-date, but not TOO up-to-date. Again, testing is the key. Never deploy anything you have not tested onto a highly-available platform. As such, I never allow Windows or Microsoft update to automatically patch any database host server. I manually inspect and apply patches, although I still use the update tool to recommend and track updates. Only apply patches during a defined maintenance window, even if you just patch a currently inactive node. That way if something goes wrong, you don’t have to explain why you took the accounting server down during month-end closing. If you have an N-1 cluster (N nodes, N-1 instances) then you will always have a free node to start on. Apply the OS patches and reboot if necessary. Failover and repeat as necessary until all nodes are done. OS patches are never cluster aware so everything is in the context of the local node. If you don’t have a free node, say with a two-node, two instance system, then I recommend clearing one node by moving the SQL instances. Always OS patch a node when it is “empty”. Don’t forget to move the Cluster and MSTDC groups too.
SQL 2000 Service Packs and Hotfixes are always cluster-wide for all components. Always start on the node hosting the instance you want to patch. SQL 2000 patches (both Service Packs and Hotfixes) only work on one instance at a time. Pay careful attention to the message boxes indicating completion. They may tell you to reboot the remote node(s). Always reboot when the installer tells you to. The system may be unstable or not failover properly if you do not.
SQL 2005 is a lot trickier. Only the Database Engine and Analysis services are cluster-wide. Everything else is local to the node. So, when you run the installer you may get different options on what to upgrade. If there are one or more instances on the node you execute the update on, you can update those instances. Otherwise you get to update just the local components. Note that this is new for SQL 2005. Before, you had to patch each instance separately. You can update the non-clustered components at the same time you update the cluster-aware components, but only for components on the local node. This is NOT clearly indicated by installer. You must go to each node and update SSIS, workstation components, Notification Services and Reporting Services. Of course, you shouldn’t be running Reporting Services on a failover cluster, but that is another discussion. Fortunately, the installer does have some smarts and will show you what has already been updated so you don’t have to re-run the update for that component. This makes it safe to run the updater if you lose track or have inherited a system where you are not sure what components are at what revision level. Service Packs and Hotfixes both work this same way. And yes, you need to update client tools on all cluster nodes and your workstations. Again, pay attention to the closing messages and reboot when told. Bad Things™ will happen if you don’t.