SQL University: What and why of database refactoring

Wed Jun 2, 2010 by Mladen Prajdić in sql-server, back-to-basics

This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover:

SQLU part 1 - What and why of database testing

SQLU part 2 - What and why of database refactoring

SQLU part 3 - Database testing and refactoring tools and examples

This is a second part of the series and in it we’ll take a look at what database refactoring is and why do it.

Why refactor a database

To know why refactor we first have to know what refactoring actually is.

Code refactoring is a process where we change module internals in a way that does not change that module’s input/output behavior.

For successful refactoring there is one crucial thing we absolutely must have: Tests. Automated unit tests are the only guarantee we have that we haven’t broken the input/output behavior before refactoring. If you haven’t go back ad read my post on the matter. Then start writing them. Next thing you need is a code module. Those are views, UDFs and stored procedures. By having direct table access we can kiss fast and sweet refactoring good bye. One more point to have a database abstraction layer. And no, ORM’s don’t fall into that category.

But also know that refactoring is NOT adding new functionality to your code. Many have fallen into this trap. Don’t be one of them and resist the lure of the dark side. And it’s a strong lure. We developers in general love to add new stuff to our code, but hate fixing our own mistakes or changing existing code for no apparent reason. To be a good refactorer one needs discipline and focus.

Now we know that refactoring is all about changing inner workings of existing code. This can be due to performance optimizations, changing internal code workflows or some other reason. This is a typical black box scenario to the outside world. If we upgrade the car engine it still has to drive on the road (preferably faster) and not fly (no matter how cool that would be). Also be aware that white box tests will break when we refactor.

What to refactor in a database

Refactoring databases doesn’t happen that often but when it does it can include a lot of stuff. Let us look at a few common cases.

Adding or removing database schema objects

Adding, removing or changing table columns in any way, adding constraints, keys, etc… All of these can be counted as internal changes not visible to the data consumer. But each of these carries a potential input/output behavior change. Dropping a column can result in views not working anymore or stored procedure logic crashing. Adding a unique constraint shows duplicated data that shouldn’t exist. Foreign keys break a truncate table command executed from an application that runs once a month. All these scenarios are very real and can happen. With the proper database abstraction layer fully covered with black box tests we can make sure something like that does not happen (hopefully at all).

Changing physical structures

Physical structures include heaps, indexes and partitions. We can pretty much add or remove those without changing the data returned by the database. But the performance can be affected. So here we use our performance tests. We do have them, right? Just by adding a single index we can achieve orders of magnitude performance improvement. Won’t that make users happy? But what if that index causes our write operations to crawl to a stop. again we have to test this. There are a lot of things to think about and have tests for. Without tests we can’t do successful refactoring!

Fixing bad code

We all have some bad code in our systems. We usually refer to that code as code smell as they violate good coding practices. Examples of such code smells are SQL injection, use of SELECT *, scalar UDFs or cursors, etc… Each of those is huge code smell and can result in major code changes. Take SELECT * from example. If we remove a column from a table the client using that SELECT * statement won’t have a clue about that until it runs. Then it will gracefully crash and burn. Not to mention the widely unknown SELECT * view refresh problem that Tomas LaRock (@SQLRockstar on Twitter) and Colin Stasiuk (@BenchmarkIT on Twitter) talk about in detail. Go read about it, it’s informative.

Refactoring this includes replacing the * with column names and most likely change to application using the database.

Breaking apart huge stored procedures

Have you ever seen seen a stored procedure that was 2000 lines long? I have. It’s not pretty. It hurts the eyes and sucks the will to live the next 10 minutes. They are a maintenance nightmare and turn into things no one dares to touch. I’m willing to bet that 100% of time they don’t have a single test on them.

Large stored procedures (and functions) are a clear sign that they contain business logic. General opinion on good database coding practices says that business logic has no business in the database. That’s the applications part. Refactoring such behemoths requires writing lots of edge case tests for the stored procedure input/output behavior and then start to refactor it. First we split the logic inside into smaller parts like new stored procedures and UDFs. Those then get called from the master stored procedure. Once we’ve successfully modularized the database code it’s best to transfer that logic into the applications consuming it. This only leaves the stored procedure with common data manipulation logic. Of course this isn’t always possible so having a plethora of performance and behavior unit tests is absolutely necessary to confirm we’ve actually improved the codebase in some way.

Refactoring is not a popular chore amongst developers or managers. The former don’t like fixing old code, the latter can’t see the financial benefit. Remember how we talked about being lousy at estimating future costs in the previous post? But there comes a time when it must be done. Hopefully I’ve given you some ideas how to get started. In the last post of the series we’ll take a look at the tools to use and an example of testing and refactoring.