Here’s my advice of the day: Try your best to store all of the data relating to your database in the tables.
Sounds pretty obvious, I know, but you might be surprised just how much data in your application is not in your tables, where it belongs, but rather nestled away in your code.
Example #1
Your company defines 3 shifts per day: Shift 1 is 12:00AM-8AM, Shift 2 is 8AM-4PM, Shift 3 is 4PM-12AM.
So, when you need to evaluate a DateTime variable to see which shift it falls into, you write:
SELECT ..., CASE WHEN Hour(SomeDate) < 8 THEN 1
WHEN Hour(SomeDate) < 16 THEN 2
ELSE 3
END as ShiftNo
FROM ...
Great, except you now have data stored in your code -- not in a table! Let's store our data where it belongs by creating a “Shifts” table:
Shifts
|
ShiftNo (PK)
|
StartHour
|
EndHour
|
Description
|
|
1
|
0
|
7
|
Morning
|
|
2
|
8
|
15
|
Day
|
|
3
|
16
|
23
|
Night
|
And now you simply write:
SELECT ...., Shift.ShiftNo, Shift.Description
FROM ...
INNER JOIN Shift
ON Hour(SomeDate) BETWEEN Shift.StartHour and Shift.EndHour
This allows you to add descriptions, shift managers, calculation adjustments, anything you need that is related to a shift. In addition, you can use this Shift table to drive reports and now changing to a 4 shift day is a matter of changing data and not by changing code. Even better, add a "WeekEnd?" flag or "WeekDay" column to your shift table to allow for different shifts for different days of the week; add more columns for more variations. Even historical changes can be represented (and clearly documented) in this table by adding StartDate/EndDate columns to the primary key.
(Note: this is a simplified example, of course, since a Shift will often last past midnight, but you get the basic idea.)
Example #2
Your company pays out bonuses based on the employee type: Managers get 10% of their annual salary, Regular employees get 7%, and Admins get 5%. So, when it is time to calculate the bonus, you simply write:
SELECT ..., Salary * CASE EmpType WHEN 'Manager' THEN .10
WHEN 'Regular' THEN .07
WHEN 'Admin' THEN .05
ELSE 0
END as Bonus
FROM ...
In addition, on your personnel reports, you always want to list Managers first, followed by regular employees, followed by Admins; you cannot sort alphabetically, so you write:
SELECT ...
FROM ...
ORDER BY CASE EmpType WHEN 'Manager' THEN 1
WHEN 'Regular' THEN 2
WHEN 'Admin' THEN 3
ELSE 4
END ASC
Works great! But you do have an EmployeeType table, right? If you don't, then it's time to create it. And even if you already have it, let's consider putting our data in the table, not in the code:
EmployeeTypes
|
EmpType (PK)
|
…
|
BonusPct
|
Sort
|
|
'Manager'
|
…
|
.10
|
1
|
|
'Regular'
|
…
|
.07
|
2
|
|
'Admin'
|
…
|
.05
|
3
|
Now, a simple join to that table gives you what you need. No more CASE's, no more things hard-coded in your SELECT. In addition, now changing those percentages or creating new Employee Types involving editing the contents of tables, as opposed to editing your code. It’s pretty simple and obvious, right? Not always!
Example #3
At your company, certain ProductTypes should not show up on the Annual reports, since they are internal only, have been phased out, or were used for testing purposes. We’ve all had to exclude certain values from reports, and it is easy enough to do, right?
SELECT ..
FROM ...
WHERE ProductTypeID Not IN (24,203,1,293)
It’s simple, and it works great! But I think you know what is coming -- why are we storing this data in our code, when it should be in our product types table? Add a bit column or some other indicator to the table, and flag those items in your data so you can remove these lists of values from your code. Even if you cannot edit the ProductTypes table, you might consider this creating a new table:
ProductTypeExclusions
|
ProductTypeID (PK, FK)
|
Notes
|
|
24
|
'Testing'
|
|
203
|
'Converted to product type #12'
|
|
1
|
'Internal'
|
|
293
|
‘Testing’
|
With that, a simple LEFT OUTER JOIN to this table lets you know when something should be excluded:
SELECT ...
FROM ...
LEFT OUTER JOIN ProductTypeExclusions PEX
ON ...
WHERE PEX.ProductTypeID is NULL
Now, we have documentation in our data as to which product types are excluded (and why), we can easily make changes w/o recompiling our code, and we can even write reports using this table. We might add date ranges to this table or other factors that determine when to exclude or include certain ProductTypes from our reports – the more you put in your table, the less you have to put in your code.
As a nice side effect, our new SELECT statement clearly tell us why we are excluding certain items from the report, and we no longer need to wonder what that big list of ID’s means!
Example #4
And, finally, consider “look-up” tables. We all use them -- it's a great way to provide a list of valid values for certain columns; usually, they don't do much else:
StatusCodes
|
StatusCode (PK)
|
Description
|
|
A
|
Approved
|
|
P
|
Preliminary
|
|
R
|
Rejected
|
|
D
|
Deleted
|
|
H
|
Archived
|
Raise your hand if you have a table like this in your system. But as it is, how much does this table really tell us? What does 'Rejected' really mean? To find out, we often have to look at our application or T-SQL code. There, we would write things like:
... WHERE StatusCode IN ('A','H') ... return everything that is valid
... IF StatusCode NOT IN ('P','R','A') THEN ... it cannot be edited
... WHERE StatusCode IN ('P','R') ... show all objects that can be reviewed
... IF StatusCode IN ('P','R') THEN ... allow it to be deleted
And so on ... (Note that this gets even worse when we use IDENTITIES as the PK for these lookup tables, but that's a topic for another day) Again, this is a simplistic example and maybe some of those codes don't make sense, but you should get the idea.
But what are we doing here, really? What is the point of that Status table? Are we using it to our full advantage? Should we really have data (the status code values) hard-coded into our code? What happens to our code when we add new statuses?
Once again, the answer is obvious: try to store more data in your tables, and take it out of your code. It may not be quickly apparent with this example, but consider this:
StatusCodes
|
StatusCode (PK)
|
Description
|
IsValid
|
IsEditable
|
IsReviewable
|
IsDeletable
|
|
A
|
Approved
|
1
|
1
|
0
|
0
|
|
P
|
Preliminary
|
0
|
1
|
1
|
1
|
|
R
|
Rejected
|
0
|
1
|
1
|
1
|
|
D
|
Deleted
|
0
|
0
|
0
|
0
|
|
H
|
Archived
|
1
|
0
|
0
|
0
|
Now, all of a sudden, our StatusCodes table has very clear definitions of what those codes actually mean (it is very self-documenting now!), and how our application and reports use them. If we add columns for all of the different core actions that the app can do in this table, suddenly we find that we almost never need to hard-code any data in our code, but rather it now lives in our tables! With a simple JOIN to our Status table, we can now write:
... WHERE IsValid =1 ...
IF isEditable = 0 THEN ...
... WHERE isReviewable = 1 ...
IF isDeleteable = 1 THEN ...
And so on. And now we can change status codes, create new ones, and reclassify things as needed -- all in our data.
Conclusion
On the surface, this is all obvious stuff. But we all sometimes find ourselves putting more data in our code than we should, sometimes because it is quicker, other times because we can't change our schema, but often because we didn't plan ahead as much as we should have.
I hope this has given you some ideas of ways you can put more of your data where it belongs -- in your tables. You will find your code much shorter and easier to understand, and you will find that suddenly your data really documents your system well and allows for surprisingly major changes to be made in your tables alone.
(Update: more on data versus code here.)
see also: