The Importance of Data Types
Imagine that SQL Server only provided two data types: the MONEY data type to store numeric values, and VARCHAR to store text.
If you are designing a database in this scenario and you need to store or return integer values, which data type -- MONEY or VARCHAR -- would you use?
Suppose I were to argue that MONEY is too complicated, and a waste of space, since all MONEY values always have 4 digits after the decimal which we don't need and it might get confusing. Therefore, we should use VARCHAR, which is much simpler. After all, if we use MONEY, every time we execute a SELECT, our integer data will be returned with 4 decimal places like "45.0000", which is certainly not what we want and doesn’t
look like an integer; but if we use VARCHAR we will get "45" which looks much better and is really what we want to see. While eventually we will notice that we cannot add these VARCHAR numbers together, or sort them, or compare them, or do any kind of math with them, we can always temporarily convert those values to MONEY if we need to do these things, right? And as for ensuring that we only store valid numbers in our VARCHAR columns, we can handle that with some complicated CHECK expressions that parses the string to ensure that it stores only valid digits. We also need to make sure that everyone is aware of the rules for the format of integer data in our VARCHAR columns, to avoid problems as well: after all, we may wish to store values like 34934 as "34,942" so that it looks just right when the values are returned and is easier to work with.
Now, if I make this argument to you, what would you say? Would you think it's a good idea?
I think we can all agree that a much better approach would be to use MONEY. Now we can add, sort, compare, do math, and all kinds of things on our values which are stored as numeric data, not text. We can add a quick, simple CHECK constraint to ensure that our MONEY values have nothing stored after the decimal (e.g.,
CHECK round(value,0) = value), and we simply let our front end format those numbers without any decimal places. In this scenario, it's probably fair to say that no reasonable programmer would argue that VARCHARs are better for working with integers than a true, numeric data type like MONEY.
Manipulate and Return Data -- not Strings
If we agree to use the MONEY data type, wouldn't it make sense to keep those values as MONEY throughout our T-SQL code, and only return MONEY values to the clients? We wouldn't want to try to convert our MONEY values to VARCHAR in our database code just so that things "look good" and we "don't see" that pesky little ".0000" after each value, right? It means nothing to us, it doesn’t affect our data, so we simply store and work with and return MONEY in
all of our database code and we don't worry about formatting or how things "look" or the extra decimal places that we aren't using. This keeps our code short, clear, easy to read since we are not converting things back and forth between VARCHAR and MONEY over and over to make things "look good" in one spot but to make them “work correctly” in another.
Along those same lines, suppose we have a MONEY value like this:
1846.4069
... and we need to return that value as an integer. What should we return, a VARCHAR value or a MONEY value? The VARCHAR value of "1846" will
look nice, but again: we cannot sort, compare or do anything with it -- it is just a picture of data, not actual data. If we return a MONEY value of 1846.0000, we can and should ignore the decimal portion since it does not affect anything and enjoy the benefit of returning an actual numeric value that both the database and the client can sort, compare, calculate, and so on.
If you don't agree with any of that, stop reading now and let me know in the comments. Otherwise, let's continue on ...
Another Obvious Example
What about if we want to store only values that are decimals, such as .0234 or .963? These values will never have any digits before the decimal, but will always have one to four digits after the decimal (we only need 4 digits of precision). That is, we are storing values from -.9999 to .9999 only. With only MONEY and VARCHAR available, what data type should we use to store and work with these values?
Once again, suppose I argue that VARCHAR is the way to go. After all, if we use MONEY, every value will be preceded by "0.", which is not what we want and again a waste of space. We don't need the integer portion, and having it returned and seeing it everywhere will be confusing. If we use VARCHAR it will always look perfect, and we again can use CHECK constraints that parse the string to ensure we have valid data. Things still don’t sort or compare or calculate, but we can CONVERT back and forth as needed.
Yet once again, I would hope that you will agree that the suggestion to use VARCHAR is not a good one. We should use a numeric data type, and therefore MONEY is the best choice. It handles our data perfectly with the accuracy we need, we can do math, sort, and compare. We know that it will be valid and accurate, and a simple CHECK constraint ensures that we only have values between -.9999 and .9999 as opposed to using string parsing to determine this. And, again, even though in the T-SQL world all of our money values have an integer portion of “0” attached to them, it should make sense that we should not worry about this since our data is accurate and correct, it doesn't affect calculations, and the client can output these decimals without the preceding “0” very easily.
This also affects doing some math on a value like 139.592 to return only the decimal portion; for all of the reasons stated, we should all agree that the value to return is a MONEY value of 0.5920 and
not a VARCHAR value of ".5920".
Finally, if we always store and return MONEY for
both our integer values and our decimal values, we have another huge benefit: we can do math on both of those values
together and get accurate, consistent results. No conversions, no string manipulation or parsing or concatenation, no worrying about how things are formatted or stored or how they look; we can do simple, efficient and accurate math on all of our data with standard mathematical operations and it just works perfectly and intuitively.
Getting to the Point (Finally!)
Are we all agreed on the above? Good. So, then, what’s my point?
Shouldn't we follow the exact same logic and reasoning when storing date and time data?
The exact same scenario happens every day, only programmers are deciding how to store or return dates without times, or times without dates. Yet, very often we see beginners choosing to store their values as VARCHARs and/or return them and manipulate them as VARCHARs, instead of using the easiest, most obvious and appropriate data type: DATETIME.
Working with Date-Only Data
If you need to store just a date, use a DATETIME. Don’t worry about the “extra” time that is stored, just enforce that it is always midnight and it won’t affect anything – just like forcing a decimal to be “.000” and ignoring it! It is the
exact same logic. And, by that same token,
don’t worry about that time being there and feel you need to convert everything to VARCHAR to make it “look” right. Just use data in the proper type throughout your SQL code, ignore the time portion, and return the DATETIME value – in the correct type – to your clients. Don’t try to “help” them by converting those nice, accurate, and clean DATETIME values to a string to “hide” the time part – just return the value, just as you would not try to “hide” a decimal portion of “.000” in a numeric by converting it to a string.
Luckily, many programmers do indeed use DateTime when all they want to store is dates, but of course many do obsess about “hiding” that time portion. Don’t do it. Leave your data in the proper type, enforce a time of only midnight with a simple check constraint (e.g.,
CHECK dateAdd(dd,datediff(dd,0,yourdate),0) = yourdate) and now you can compare, sort, do math, and everything else you ever need on those values, and your client can do the same. Just as you would use a MONEY data type to store numeric data, if you had to, as opposed to a VARCHAR. It’s easier, shorter, and more efficient – it just makes sense.
The same concept applies if you have a DATETIME value like "5/6/2007 5:00 PM" and you'd like to return just the date portion: Should you return a VARCHAR or a DATETIME? For all of the reasons stated, it should be clear that you simply return a DATETIME value a midnight -- "5/6/2007 12:00AM" -- knowing that the time portion is now "zero" and even though it is there and we see it, it does not affect any calculations and can easily be excluded when the client presents the data.
To "round" a DATETIME so that the time at midnight is returned:
select dateadd(dd,0, datediff(dd,0, datetimeval)) as date_at_midnight
Working with Time-Only Data
Now, what about storing only a Time? Here’s where people
really get confused, but, again,
it is exactly the same as storing a value between -.9999 and .9999 in either MONEY or a VARCHAR. We agreed to use MONEY, and simply accept the fact that our integer portion of “0” is
always there, but it doesn’t affect anything; we can simply ignore it and let our client worry about not displaying it. Remember, when storing a true decimal value, we had no choice – that zero was always there, whether we displayed it or not, right? Well, DATETIME values have the
exact same concept! The decimal portion of a DATETIME is the time component, and even though we just want to store time values, a DATETIME works perfectly for us. We simply ensure that we store a 0 value as the date portion and we
ignore it!
For DATETIME data, a date of “0” that doesn’t affect any calculations is “1/1/1900”, just as a time of “0” that doesn’t affect any calculations is “12:00:00 AM”. Thus, “1/1/1900 12:00:00AM” is the equivalent to a decimal value of 0.0 -- adding or subtracting that value to other values will not have any effect.
In fact, take a look at this:
select cast(0.0 as datetime) as ZeroDateTime
ZeroDateTime
-----------------------
1900-01-01 00:00:00.000
(1 row(s) affected)
select getdate() as Now, getdate() + '1/1/1900 12:00:00 AM' as NowPlusZeroDateTime
Now NowPlusZeroDateTime
----------------------- -----------------------
2007-08-30 20:55:48.513 2007-08-30 20:55:48.513
(1 row(s) affected)
Notice that the numeric value of 0.0 is converted nicely to '1/1/1900 12:00:00 AM', and that adding '1/1/1900 12:00:00 AM' to a DATETIME value has no effect.
So, if we want to store
only a time, without a date, we just:
- Use a DATETIME data type
- Ensure that the date value of that time is 0 (that is, "1/1/1900") with a check constraint (e.g., CHECK datediff(dd,0,TransTime) = 0)
- Ignore the date in our SQL code even though we might “see it” here and there
- Keep our data in the correct data type throughout our code without worrying how it “looks” and trying to “hide” the date part
- Return that DATETIME value to our client and let it worry about hiding the “1/1/1900”, just as we let the client worry about hiding the preceding “0.” in a decimal value such as 0.2394.
Once again, let's consider what to do if we have a DATETIME value with such as "6/1/2007 5:00 PM" and we'd like to just return the time portion: Do we return a VARCHAR or a DATETIME? By now, the answer should be clear: we return data in the correct type, DATETIME, and we simply use the "0" date of 1/1/1900 by returning "1/1/1900 5:00 PM" knowing that the date portion will not affect the time value just as a an integer portion of 0 will not affect a decimal value.
To return just the time portion of a DATETIME (i.e., at the base date of 1/1/1900):
select datetimeval - dateadd(dd,0, datediff(dd,0, datetimeval)) as time_at_base_date
Finally, just as in our previous MONEY example, if we exclusively work with dates and times using the correct DATETIME data type throughout our schema and our SQL code, we can simply add values together to combine a date and a time using simple math:
select somedate + sometime as somedatetime
No converting, no string manipulations or parsing, no worrying about AM or PM or time formats or anything. If we keep our data in the correct types, and constraint it properly, everything works as it should – quickly, accurately, and intuitively.
Conclusion
The next time you are working with dates and times, please remember: how would you handle things if you were working with integers and decimals? The same logic and reasoning applies. Be smart, let SQL do the work for you and use the right data types for the job, even if things don't always "look" right. It's not about how good your data
looks, it's about how accurate it actually
is.
see also: