Normalization for databases is like Dependency Injection for code

Wed Sep 17, 2008 by Mladen Prajdić in net, sql-server, back-to-basics

Let us start with a simple question:

What is the goal of software development, be it database or .Net (or any other language)?

The first answer would be: Customer satisfaction!

And you'd be right. However there's more to customer satisfaction then the immediate product delivery effect. We have to think about future change requests, maintenance periods, etc...

Almost every business application out there consists of 2 basic parts: database back end and some kind of front end that consumes the data. In our case the front end is anything with access to the database.

In regard to future code changes and addition of new features what is the best way to construct this code? I would say the best way is using modular or black box design which states that solutions to big problems are built from solutions to smaller problems. This means that we have small independent modules that are black boxes to the outside world and they perform some operation based on input parameters and optionally provide output for other modules to consume. So how does this apply to our back and front end systems?

Front end - object oriented code

Dependency injection (a form of Inversion of Control) is the leading principle that enables our goal of keeping code in modules that have no outer dependencies. The input (constructor) arguments are usually interfaces that provide knowledge about the other modules that our module (class) uses. With this setup we can easily test each module by itself using automated unit tests. Also changing a module does not break other modules unless we change it's interface. And this is what we want.

This also enables us to keep our code separated and non repeatable. The worst thing we can do is have code that does the same thing in more than once place. Think maintenance nightmare!

A new feature request came in? No problem! Build it, test it, use interfaces of existing modules to access their functionality if needed. Change of an existing feature? Again, no problem. Change the module holding the feature and write new tests to validate its working correctly. And we're done.

A module in object oriented code is a class that exposes its functionality through an interface.

A simple example just to demonstrate the Dependency injection (DI) using Star Trek theme for all the geeks out there :)

Move method implementations:

Method implementations are irrelevant, since the point here is that our ships are built from modules. They know only about the modules interface and that's why the module itself can be changed with no problem. We could completely change the internals of ImpulseEngine without having to change a single thing in our ship as long as the drives interface stayed the same. And yes, we could refactor this simple example further on.

If you ask me Dependency Injection is every developers dream. Unfortunately that dream usually comes to a screeching halt when we start dealing with databases.

Back end - databases

The concept of DI is meant for object oriented logic which is excellent for code but totally fails when dealing with databases. A module in code is a class that implements an interface which exposes it's functionality. There is no such concept in databases.

So what is a module in a database? Table, row, stored procedure? Wrong.

A module in the database is DATA.

Remember: Data is the core of the business application. Bad data, lousy business. Doesn't matter if you have the most amazing application known to mankind.

Denormalized design

This is a classic example of a denormalized schema. The problem lies in the ShippingAddress column. Since it is in the Orders table it could contain duplicates if we're shipping different orders to the same address. Because of this we can and always will get data inconsistency. And that is bad for a number of reasons.

This below result set is the common problem with denormalized design. If we edit and save the address for order 4, we can change it independently of order 1 despite both orders having the same address. Now we have duplicated data and we don't know which one is correct. In a normalized design this isn't possible.

Normalized design

This is the upper table normalized into 2 tables: Order and Address. This way our data is only in one place and thus it can't get corrupted by bad input. Note that AddressValue column should be split further, but for shortness sake lets make it simple.

Data normalization is not just a whim of every developers arch nemesis - the DBA ;). It's a simple concept that is used to guarantee data consistency and validity. So when the time comes to change or extend our schema we don't have to change it in different places in our database. Just like Dependency Injection enables simple and testable change in code by modularizing objects so does Database Normalization by applying the same goal for data.

So don't fear normalization. Embrace it and make your life easier in the long run.

Denormalizing on purpose

The most heard and quite valid argument is that fully normalized database schema is slow. This is true since querying 4 normalized joined tables is slower than querying 1 denormalized table.

However this argument is usually used in the wrong context. People tend to just leave their data denormalized.

Always first normalize data and then denormalize it if the need arises.

With SQL Server this problem can easily be solved with indexed views. With them we can query the denormalized data but update the normalized data. And thus we can achieve the performance we need without having duplicated data.

Summary

Dependency Injection and Database Normalization are two entirely different concepts that strive to achieve the same thing. Ability to make quick and stable changes.

Keeping the area of change small enough helps us make any kind of change fast be it small or large. In the long run this is what we want. Let's face it, how many times did an important business application that supports some business process have a lifetime of six months or less without the slightest change? Almost never. And because of that it pays to design our business software properly. It really isn't that hard.

Legacy Comments


Jeff M 2008-09-22	re: Normalization for databases is like Dependency Injection for code Mladen, you are AWESOME!!!!!!!! Wonderful article! You've just saved me a ton of time with management and fellow developers both now and in the future! You've explained the concepts so much better than I ever have. Thank you!

Mladen 2008-09-22	re: Normalization for databases is like Dependency Injection for code thanx! glad to be of service :))

JohnFx 2008-09-24	re: Normalization for databases is like Dependency Injection for code Great article. I love the angle of putting normalization in terms of code modularity.

Joe Celko 2008-09-24	re: Normalization for databases is like Dependency Injection for code Your DDL was all wrong; singular names for tables (only one row each?) a magical, universal "id" in all the tables, and no DRI. Do you really give your customer orders a name? With everything NULL-able there was no way to have keys. Without keys, normalization is not possible. You probably wanted something like this: CREATE TABLE Orders (order_nbr INTEGER NOT NULL PRIMARY KEY, customer_name VARCHAR(100) NOT NULL, -- no DRI? address_id INTEGER NOT NULL REFERENCES Addresses (address_id) ON UPDATE CASCADE); CREATE TABLE Addresses (address_id INTEGER NOT NULL PRIMARY KEY, address_ txt VARCHAR(100)NOT NULL);

Mladen 2008-09-24	re: Normalization for databases is like Dependency Injection for code Joe, glad to have you here and thanx for the comments! They're appreciated.

Seun Osewa 2008-11-15	re: Normalization for databases is like Dependency Injection for code Putting addresses on a separate table is not normalization, unless you have extra information about each address.