I recently submitted my Data in tables versus data in code
post from a ways back to reddit
, and I was surprised that there ended up being quite a bit of discussion about it. There was even an, um, interesting reply from another blogger
. Without a doubt there were some great and thought-provoking comments all around.
A few comments were made on one particular point, which is a great one and important to address: What is data, and what is code?
How do we define the difference, and decide what goes where? It is great to say keep data out of your code, but what if that data is integral to the application itself? Isn't it therefore code, and not data?
As always, the answer is: it depends on specifications. I think it is very easy to distinguish between the two, it all depends on how you design your application and how you are specifying that it will function.
Here's some examples.
"The XYZ Customer system allows you to store and track customers. Each customer can also be assigned one of 4 status codes: 'A' means that the customer is active, 'I' means inactive and they will not display on reports and cannot be edited, 'X' means they are archived and cannot be edited, and 'P' means they are preliminary and can be edited but will not show up on reporting. The XYZ Customer system allows you to define the labels for these status code any way you want, so that 'A' can be labeled 'Active' or 'Current' or whatever your business uses."
Let's assume that the system is written to take these specifications literally. Now, in the XYZ Customer system, what are these statuses -- code or data? The answer is: code. If you design you system like this, and spec it up like this, and these code values are hard-coded into your data and what they do is also hard-coded and fixed, then of course they are tightly integrated with your application code, and they are indeed code. You can store the labels in the database, and you might even store a status code table in your database well, but with this design and these specifications, these statuses are part of the code and not data.
Let's consider another system, instead.
"The ABC Customer system allows you to store and track customers. You may also assign customers a status code that determines whether they can be edited, viewed, deleted and so on. These status codes can be defined by the administrator using the 'Admin' feature. Each code is assigned a label and also attributes that define how the application reports and processes customers with that status assigned."
This is how I choose to design my systems. In this case, your status codes are clearly data. These two systems, with the same set of status codes, will technically work exactly the same way. But the difference is that the ABC system doesn't have code like "IF Status=A Then display it"
hard-coded throughout; it simply checks to see if the current status code of the current customer means that it should be displayed on reports.
While I firmly believe that this design is preferable, of course I will not say that it is right for everyone and all situations. But for me, it works beautifully. If you implement it properly, your code is much, much shorter and very easy to read and edit, yet your application is much more flexible and powerful.
"The Three-Shift application is designed to track all of your employees and their hours. It allows you to assign employees to one of three shifts: Midnight-8AM, 8Am-4PM, or 4PM-Midnight. Any time an employee works, you can determine what shift they worked by assigning them one of the 3 shift codes."
Here, you can argue that the shifts are code, not data. The limit of 3 is defined in the application and hard-coded; the entire system and probably all of the code written depends on those 3 shifts being defined and limited to those time periods. Sure, we can release a new version next year that adds a 4th shift option, or maybe it allows you to store the time periods for the shifts in a config file or something. But, overall, as defined, you could argue that if you took these specifications literally and had no need to design the application with flexibility, these shifts are code. I'll leave it up to you to decide if this is a good specification or application design.
"The Multi-Shift application is designed to track all of your employees and their hours. You can define as many shifts as you need and historically track changes in those shifts. Each shift can be assigned a title, a shift manager, a start/end time, and an effective date."
I hope it is clear that in this case, by these specifications, shifts are simply data. Again, I am sure it will not surprise you to hear that I design my applications this way. The code is actually shorter, simpler and easier to read and maintain. You lose nothing but gain a lot by designing your system to be flexible and to grow easily by simply defining things as data and not hard-coding everything into your applications. Reporting is consistent and easier, and historical reporting for different shift definitions is easily done as well which was not possible or easy if the application itself changes those definitions via code changes.
I hope this helps to clear things up a little. The only rule is this: The specifications themselves along with how you
decide to write your code determine what is code and what is data, not some generic rule of thumb. Each case is different. You can always choose to hard-code values and data into your application code, that is certainly your right and it certainly will work; indeed, if you do that, your data does literally become code and your application is now less flexible.
If you decide to define entities and elements of your application as data and write your code in a way that it simply processes the data and does what the data defines it should, I think you might find that your code just got a lot shorter and more flexible and powerful at the same time. That's usually a good thing, right?
One final note: I think a lot of this also depends on the programmer's philosophy in terms of making users self-sufficient. When I write code for my users, I tend to expose every detail I can to them and hide no calculations, values, data or settings from them -- they are all there to see and potentially change. Maybe lots of this is read-only or requires a super-user or administrator account to change, and many settings might have effective dates and/or audit history, but it is all out there. Changes that are necessary, sometimes even significant ones, are often easily done without even needing to pick up the phone and call me. The system can often live on for years and adjust to new business processes without changing a single line of code if I did my job right and planned appropriately. It is also quite self-documenting. (What does status code "Q" mean? Look it up in the status codes table; it is all there, defined in one place.) I like that; I like my users to be happy and self-sufficient and to really understand how an application works and to be able to configure it to meet their needs.
I have found that other programmers take the opposite approach: They prefer to hard-code and hide important attributes and settings in the code, and that any even minor tweaks or changes require their assistance and "expertise". Maybe it's their way of obtaining perceived job security? It could be. But I think it is more often the lack of planning ahead and not carefully designing flexible applications that correctly separate data from code.