Dealing with poorly designed databases is a simple and common fact of life for programmers. It happens, sometimes due to lack of experience or education, or sometimes because business requirements were never analyzed properly or they changed. We've all inherited poor database schemas that we've been forced to work with, and of course we've all created those poor schemas ourselves over the years.
For this discussion, I will classify the database model problems we encounter into the following two categories:
Physical Design Problems
-- The physical design of the schema -- tables, columns, data types, constraints, etc -- is causing inefficencies in performance and/or code. The design is technically storing the data it needs to, but perhaps it is not normalized properly, or there's lots of duplicates, incorrect data types, columns that contain CSV data, and so on. It "works", but you have to write large amounts of complicated SQL to do basic tasks.
Logical Design Problems
-- The logical design is incapable of storing the necessary data; we simply cannot model what we need to store. For example, the model might allowed for only one address per Vendor, but we need to store multiple addresses for billing versus shipping; or perhaps we can only assign one consultant per Project but in reality we need to store many of them, all assigned particular roles.
(Of course, many data models have both
I actually feel that logical design problems are better
to deal with. Why? Because it forces
us to stop development and to redesign the database and/or applications, since it simply is not working. Sure, it can be a lot of effort, but in the end, if we did things well, the benefit is that we now have a good data model that stores what we need.
Physical design problems occurs very frequently, and it is a far worse situation. It rarely gets addressed or fixed or changed, because it is getting the job done. The result is that, over time, we keep piling on more and more code and hacks and tricks to get data in and out, and the performance and complexity of the code grows greatly, but it all still "works". There is never really that motivation to fix things until performance or code maintenance is so bad we have no other choice. At that point, of course, it is a huge project since we have to deal with large volumes of convoluted code. If we had refactored the design properly earlier on, we'd be better off, but there was never truly that need or motivation to do it.
The reason why I bring this up is because we often do need to refactor our designs; and, in fact, in some cases we should do it even more often that we do (i.e., physical design problems). When the logical design doesn't work, we have no choice: It needs to change. If we try to use "tricks" to make a perfectly good schema work for an entirely different logical model, we end up with a really poor schema and now we've have both physical and logical problems with our model. (Only one place to store the ConsultantID for each Project? Simple -- change that column to a CSV list!) The worst of both worlds!
We often see people require help with writing SQL in the forums that ends up being very difficult because the database design is so poor; the best solution is always to simply to fix the design. But, of course, that rarely happens. The reasons given are usually:
- We have lots of reports that access the schema and we'd need to rewrite them all
- We'd have to go through our applications and re-write all of the data access code
- There are DTS packages with SQL statements that would need to be rewritten
- Other external processes access the database schema, it would be impossible to track them all down
Now, let's be clear: If your logical model doesn't match your schema, all of these things need to change
. If your code or reports or processes are written to allow only one consultant per project but now you need many, it is not just a physical database change, you'll need to go through and change user interface forms, report output, and so on. There's really no quick and easy fix to accomplish this.
If only your physical model should be altered, that's a pretty compelling list of reasons to leave things "as is". And that's the really problem, here, not
the schema! The issue isn't that changes are necessary, it's that making those changes seems nearly impossible, so it is rarely done.
The poor physical model is one thing, but not the worst design decision that needs to be dealt with. The real issue that makes it difficult or nearly impossible to quickly fix or adjust a schema for better performance or a more normalized design is that the following rule was broken:
Don't bind code outside of your database to your schema!
Suppose you need to fix that poor physical design and change that CSV column into a related table that stores the values properly in multiple rows. Yet, every report and application and process out there directly accesses that CSV column and expects things to be stored in that particular way. Normalizing the schema can be very difficult, since everything that ever accesses that column would need to be identified and rewritten.
Instead, what if we restricted access through stored procedures? Since only the stored procedures stored in the database directly reference the CSV column, if we change the column we only need to use sp_depends
to find the objects that reference that column and alter stored procedures. None of the applications or processes or reports outside of the database requires any changes! We would simply alter our schema, then go through the list of stored procedures one by one and fix them. Our inputs and outputs can remain the same, but the SQL processes in the middle can now be greatly improved since we now have a much simpler and more efficient schema. Once those procedures are altered, we are done! No reports to change, no processes to alter, no applications to rewrite.
My primary point is this: Database schema refactoring is often required. And it is almost always a good idea to do it -- the longer you deal with a bad database model, the worse and worse your code and performance becomes. If your logical model needs to change, by definition your reports and applications and processes need to change, too -- that's a fact of life. But if only your physical model needs to change -- i.e., data types, constraints, normalization, etc -- and you've restricted access to your database through stored procedures only, then only your stored procedures will need to change.
And that makes fixing those broken schemas much, much easier.
It's not a crime to create a poor database design; it happens all the time, and modeling is truly an art and not a science so a good idea one moment might not be so good the next. The idea of not binding outside code to your database schema, however, is simple science and easy to do; if you follow this approach you will find that your life will be much, much easier when physical database refactoring is necessary.