Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

Don't Let Output Dictate your Database Design

I often talk about "database layer" versus "presentation layer", but even the within just the database layer it is important to understand that how the data is physically stored does not always have to correlate with how the database returns results.  There is still room for your SQL statements to use Views, Stored Procedures and User-Defined Functions to occasionally transform your well designed, clean normalized data into something that is easier for reporting tools to consume – especially if those tools are used by clients who don't have strong skills.

We often see bad designs submitted at SQLTeam.com with the justification of that design being "the client wanted it that way! They insisted that the database should store things in all in one XML column" and so on.  While some clients can indeed be insistent with these requirements, remember that you can often have the best of both worlds – a clean, normalized design AND data that "looks" the way your clients want – if you take the time to demonstrate to them how the database can transform your data so that it looks the way they expect and the way they feel comfortable working with.

Some examples ….

"I want to see all categories listed in 1 column in the Product table.  A related table is too much trouble and it's hard to do the joins.  All of the categories that the Product belongs to should be stored as CSVs like this: 'shirts,clothing,summer'."

This is a common one, where a client or junior programmer complains that "joins are a lot of work!" or "confusing" and that they'd like to avoid them.  It's really ironic when you think about it (this design is much more difficult to work with rather than using simple JOINS), but I see this time and time again. 

So, how do we handle this situation?

Chances are, explaining the benefits of a normalized database to your client is not going to work; you need to demonstrate to them that even when storing the data properly, we can still create a View which returns those exact results, making them happy. Using a simple UDF or other techniques to do this will result in the best of both worlds:  Your client is pleased that things "look" they way they want, and you have a clean database design.   Don't just give up and give them what they want, and don't lecture them with technical talk about normalization that they won't respond do, just demonstrate to them how the View works and how it is instantly updated and flexible and fast and they will be pleased.

Here's another common one:

"We want to store account balances in a table with columns for Jan, Feb, Mar, etc.  All of our reports format the data this way, and it will make our reporting much easier."

This is another common request for an unnormalized database to make things "easier". If only they knew how much harder this would be to have things set up this way!   Again, we can try to explain to them the benefits of a reporting tool or even an Excel pivot table, and how these results are easily returned from a standard, flexible transactions table, but explaining and "talking down" to them won't help; we need to demonstrate to them that a simple View can once again do the trick.   I usually strongly recommend to cross-tab data at your presentation layer and not at your database, but since this is static (it will always be 12 monthly columns), it is fairly easy and efficient to do in T-SQL.   Just create a View that presents the information this way, while storing the data in a standard normalized format, and everyone wins.

In addition, we have the classics:

  • "dates that must be stored like 04-Jan-07"
  • "phone numbers must be stored like 617.123.456" (no trendy web site uses (xxx) xxx-xxx any more!!)
  • "The customer names must be stored in all capitals because our invoice displays them that way"
  • "We'll want reports that show Budgets and Actuals and also calculates variances, so there must be a Budget column and an Actual column in the tables."
And on on and on ...

Does anyone else have other good examples along these lines?  Perhaps some crazy requests from clients that you were able to satisfy but still use a good database design?  Let us know!

(By the way -- don't be afraid to leave a comment! It's easy [despite the captcha!], and it makes writing more fun when people respond, especially in this case with stories or ideas of their own. )

see also:

Legacy Comments


Mike Hoyt
2007-06-07
re: Don't Let Output Dictate your Design
I work for a school district. Our High School includes several "tracks". There is no field in our commercial database for "track", so the counselers have been doing what I call a "semi-clever" solution of including the track in the course title. So, for instance, we have Algebra I T10, Algebra I T20, Algebra I T30, etc. Where the TXX indicates a track. This lets them pick the right Algebra course for a kid in a given track, but it denormalizes the course title field by adding in track information.

I appreciate your advice to not talk to them about normalization or to talk down. To them, it sounds like esoteric dogma, and their solution works fine (as far as they know). But showing them how we can use one of the vendor provided user defined fields for tracks is much easier for them to grasp.

I've seen a lot of "semi-clever" solutions. People useing the FAX field for some other purpose, "cause we'll never need to know the fax number anyway"....

ReadySetStop
2007-06-07
re: Don't Let Output Dictate your Design
I wish I could remember how I did this in prod, I know it wasn't pretty...

My last client needed dates stored in what they called Julian format. (Not, of course, what Julian dates actually are...) Two digit year, three digit day of year, as an int so that they could do date calculations between events. DATEDIFF just didn't seem to impress them for some reason...

Add to that the need for a calendar table populated with their holiday schedule so we could calculate business days vs. calendar days. It was a bit messy.

Jeff
2007-06-07
re: Don't Let Output Dictate your Design
Thanks for the great examples! Keep them coming.

By the way -- calendar tables are actually very handy if you set them up properly; they allow you to do what you indicated (track working days and holidays) but also do things like return rows when nothing occurred on a specific day, and actually can make your SQL more efficient since you can index columns like month and year and so on. Also, if your company tracks weeks or dates in a non-standard way, it is really useful to store all this is a permanent, indexed table.

Daniel Lyons
2007-09-07
re: Don't Let Output Dictate your Database Design
One I saw recently had to do with storing coordinates for a square. The developer had chosen a string 'xxx,yyy,zzz,qqq' in a column simply called 'region,' presumably because he just forwards that value off to some database unaware drawing function. This was a problem for me mainly because there are many ways to represent a square using four numbers; if the data had been normalized, the name of the column would have given me a clue as to how to construct them. Of course it also means the column type is string, and therefore would permit any crazy set of characters, rather than exactly four not null integers, which could further be constrained to fall within a certain reasonable range, such as 0 to 1000.

I heard recently that QuickBooks uses 'CustomerName' as the PK for its customer table internally, which forces users to come up novel conceptions of names. Some users treat the name like registering for an AOL account (John Smith1, John Smith2, etc.), others put things besides the name in there, or include other information like the phone number (John Smith 555-123-4567). Incorrectly applied principles of database construction such as using the wrong column as a key is probably a big factor in the root cause of the anti-RDBMS backlash we're seeing now.

Jeff
2007-09-07
re: Don't Let Output Dictate your Database Design
Great examples, thank you Daniel!

John "Z-Bo" Zabroski
2007-09-07
re: Don't Let Output Dictate your Database Design
@Jeff
@"We want to store account balances in a table with columns for Jan, Feb, Mar, etc. All of our reports format the data this way, and it will make our reporting much easier."
@we need to demonstrate to them that a simple View can once again do the trick. [...] but since this is static (it will always be 12 monthly columns), [...] Just create a View that presents the information this way, while storing the data in a standard normalized format, and everyone wins.

This sounds like a use case for snapshots, not views. =)

webmat
2007-09-11
re: Don't Let Output Dictate your Database Design
To expand on what Daniels was saying, I usually say that primary keys should never be data the customer / user is aware of.
Some people consider using solely a customer's phone number a good idea for a primary key. Madness lies this way:
- "Oops! we did not include the regional code, because we started with only local customers!" (And now we're expanding to other cities/states)
- "Oops! we rely on phone numbers of a given format and now we are expanding to other countries/continents!"
- "Oops! one of our customer's spouse/child also wants to be our customer now. And, surprise, he/she has the same phone number!"
- "Oops! we were visionaries enough to use the regional code in all keys, but now the regional code assignment is being changed for a couple of towns around here!"
- "Oops! one of our customers is moving and her phone number is changing!" (Now you have to tell her to remember that her customer number is her last phone number, not the new one)
I guess the list can go on, too. Anyway, now you end up with problems created by your database design influencing the business, or worse, an end customer.
For other types of bad primary keys, the examples might not be as obvious, but trust me: a primary key must not be known and used directly by the user of your system.

To end this beautifully, here's a little gem that will make you cringe. I once heard about a system where the content of the database contained various number spaces before data: " 234", " 9", " 435634".
Q: Guess why?
A: So the data can be right-aligned in the UI. The best part: the users had to make their searches including the spaces, and had to create new items with the right amount of spaces!
WFT material, really :-)

Evgeny
2007-09-13
re: Don't Let Output Dictate your Database Design
I have been doing a TestDirector to Bugzilla/JIRA migration lately for work. TestDirector tables have so much Polymorphism it's really hard to graps. For example, the BUGS table has a restricted amount of user defined columns, and the SYSTEM_FIELDS table stores the values for the column names. So basically the BUG can't have more than X custom columns, and to know what each column name is - I must crossreference a different table. Naturally at one time or another a custom column named W was renamed to be Q, and the values in the BUGS table remained to contain whatever W is. (think W=version, and Q=build number).

A nightmare?

No. It's widely used, and some products designed at our company actively use the same approach for storing data in the database. Much fun is guaranteed for future generations.