Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

Data Types – The Easiest Part of Database Design

Database design can be very complicated, and it truly is an art as opposed to a science; sometimes there are multiple correct ways to model the same data with pros and cons to each.  I can understand that normalization can be tricky to comprehend and to implement, and that concepts like stored procedures and foreign keys and even indexes and constraints can take time to grasp.

But – what about Data Types?  They are so basic, so simple, so fundamental; not only for database design, but for any sort of programming in general … what excuse is there for not using correct data types for the columns in a table design?

I see it time and time again in the SQLTeam forums – "dates" that don't sort properly, "numbers" that don't add correctly, "boolean" columns containing 10 different values, invalid entries that somehow show up in "date" columns, and so on … Of course, since we are rarely provided any DDL to review, it often takes dozens of posts going back and forth until we finally realize: "wait …  you aren't using a datetime data type to store these dates??? Arggh!!"

Look, I can understand why you think that columns like Period1, Period2, …, Period12 are a good idea in your tables at first … I can accept that you have nothing but IDENTITY primary keys … I see the perceived advantage of using dynamic SQL created in your client applications, instead of using stored procedures, to keep things "simple" … but using correct data types is so basic, so important, so fundamental, and so crucial to pretty much anything working correctly in a database that there is just no excuse or reason to abuse them!

In short, even a poorly designed database, with one giant "master" table with no normalization or logic anywhere in sight, should still at least use a Money data type to store currency values!

Perhaps the confusion comes from Excel users, where data types are handled behind the scenes  … or maybe "old school" VB programmers used to using variant data types (or worse – undeclared variables!) to store values… But when you design a table in any database, you are always explicitly stating the data types of the columns – there's nothing hidden, or no option to ignore them.   You must declare a data type when creating a column, so how can anyone justify using VARCHAR to store a date?

Well, I suppose we are all guilty on way or another of abusing data types.   I often use Integers when a byte will suffice, or a DateTime when a smalldatetime would do just fine, or VARCHARs that are larger than necessary.  But at least, fundamentally, those values act the way they are supposed to – it is just a matter of optimizing disk space in these cases.   Using an Integer instead of a Byte may waste some space, but at least I can still calculate,sort, join and compare those values correctly – which is what a database is all about. 

So, please, the next time you create a table – think carefully about the data types of your columns.  It's not a "formatting" declaration, like in Excel, or some sort of general guideline that SQL will follow, or something to determine later on.  Each column's data type is a crucial, fundamental aspect of that table and your entire schema, and greatly determines the efficiency of your code in terms of both performance and simplicity.  So many complicated SQL "problems" and "puzzles" that beginner programmers encounter can be solved easily and quickly by simply using the correct data types on your tables.  It's not all you need to do to ensure a good database design, but it is the very first step.

see also:

Legacy Comments


KenW
2007-07-06
re: Data Types -- The Easiest Part of Database Design
Jeff,

I think part of the problem with using DateTime in SQL Server is the fact that it combines the date and time, and therefore makes it harder to use in a WHERE clause. You need to either remember to provide a time as well as the date, or use DateAdd() - neither of these are intuitive when you're first getting started with SQL. So new people can't figure it out right away, and realize that if they just store the date as a VARCHAR they can write simpler queries.

I use SQL Server, but more often use a smaller database engine (required at my fulltime job). The only accepted date format in that engine is 'YYYY-MM-DD', including the quotes. In ad-hoc queries, it's really simple to remember; in application queries, I use parameters and don't have to do anything special.

Jeff
2007-07-07
re: Data Types -- The Easiest Part of Database Design
Ken -- if you don't want to store times with your dates, then don't store them -- then there is no need for DateAdd() or DateDiff() or anything fancy.

Here's an analogy: suppose sql only supported MONEY and VARCHAR data types, and you need to store integers. Should you use MONEY or VARCHAR? You might argue that you should not use MONEY, since it always includes the decimal portion -- which you don't need, and having a decimal portion there somehow will complicate things or waste space.

Of course, that would be silly and make no sense at all -- just store integer values in a MONEY column and everything works (adding, math, sorting, comparing) and you don't need to establish any "conventions" which would be required with a varchar (do you store commas? leading zeroes? etc) . If you use VARCHAR, things won't sort correctly unless you store it just right (i.e., leading zeroes) and you need to write lots of CHECK constraints to avoid garbage in your tables. Why would anyone ever do this? You would always use a MONEY, never a VARCHAR, to store integers, right??

That is the same as using a VARCHAR formatted as "YYYY-MM-DD" or any other convention and avoiding datetime because you don't need the extra time info. It complicates things, makes no sense, requires non-standard formatting "conventions" to known and followed by everything that ever interfaces with your database, and it is very difficult to ensure your data is valid.

And all of those things mentioned are the self-imposed challenges and complexity that people bring upon themselves when they avoid simply using the correct datatypes for their data.

Sheeri Kritzer
2007-07-09
re: Data Types -- The Easiest Part of Database Design
The main idea is to THINK. Jeff's comment and analogy make good points. However, the important point is to THINK. If for some reason you actually want to use characters to store an IP address, instead of numbers, have a reason. Maybe you have a lot of people going into the database to look at the IP addresses and never actually want to compare any, or sort them in order, or find all IP addresses within a certain bitmask.

So long as decisions are made because they're thought about, that's fine. But there are way too many people who don't think. Recently my company was looking for another DBA (we just hired one, yay!) and I asked to see sample schemas, and then asked the interviewees about them. When folks had VARCHAR(255) DEFAULT NULL specified for most fields (including numeric primary key fields that mimic IDENTITY!) and their response to "What would you change about this schema?" was "It's perfect how it is," I knew we could scratch them off the list.

KenW
2007-07-09
re: Data Types -- The Easiest Part of Database Design
Jeff,

I think you misunderstood me... :-)

I was not advocating using any other data type as storage for a datetime, for any reason. I firmly believe in using the proper datatype all the time.

Regarding your MONEY/VARCHAR question: I'd definitely use the MONEY type. However, someone I work with here says that if you're not actually using it as a number you shouldn't store it as a number. It leads to all kinds of PITA, because what appears to be a numeric value isn't, but is a CHAR/VARCHAR instead; unfortunately there's not a lot I can do about it, as it's legacy data (from old Clipper applications) that is currently in use in Delphi applications - because of the continuity of historical data (and the resistance to change of some management <g>), we're stuck.

Carfield Yim
2007-07-12
re: Data Types -- The Easiest Part of Database Design
There are organization like to structure like HashMap with {name, value} to store data in order to make it flexible, so that the data type must be VARCHAR.

In order make it easier to manage, they add column like {regex, who_change, when_change, for_what....} . Which, pretty amazing to maintain and understand....

Er.Paramvir Singh
2008-05-12
re: Data Types -- The Easiest Part of Database Design
what is the data type in sql server 2005 to store the time only data in table. if we are defining datetime its automatically picking the date as well..that we don't want

Jeff
2008-05-12
re: Data Types -- The Easiest Part of Database Design
Er.Paramvir Singh --

please read this very carefully:

http://weblogs.sqlteam.com/jeffs/archive/2007/08/29/SQL-Dates-and-Times.aspx

Red
2009-10-21
re: Data Types -- The Easiest Part of Database Design
Jeff,

Great article, which I happened to find on Google - first hit.

A work colleague recently stated that 'we were moving away from data-typing', (his defence of the fact that he thought that always using varchar was acceptable), which to me was like saying: "hard-drive space is now so cheap - we don't need to normalize database tables".

I would like to note to Sheeri that I store IP addresses as string every time - because I don't perform maths on an IP addresses (I assume that Sheeri stores each numeric portion of the address in 4 seperate fields, as I can't think of a non-string data-type that can store a 'dotted decimal number').

This article argued every single point accurately and simply, and I hope my colleague managed to get around to reading it.
It grieves me when self-taught people portray themselves as 'experts', without ever bothering to ensure that they haven't missed fundamentals such as data-typing, normalization etc.

Thanks Jeff - for saving me the time in composing a document that would have pretty much replicated exactly what you have stated here. (The fact that it would have been replicated heartens me: my training must have been good).

ATB

Red


snow boots for women
2010-10-06
re: Data Types -- The Easiest Part of Database Design
Using an Integer instead of a Byte may waste some space, but at least I can still calculate,sort, join and compare those values correctly -- which is what a database is all about.