There’s a handy little rule of thumb I use when developing my database models.
It comes up often in situations like this:
If we are storing phone numbers in the database, should AreaCode be a “free-form” char(3) column? Or should it be a foreign key reference to an AreaCodes table?
Well, like most things in life, it depends.
There is no universal correct answer.
Some people will say that if you are storing the same piece of information more than once in your tables, you don’t have a normalized design.
Again, that is sometimes true, sometimes not.
For example, I store the number 0 and the first name “John” lots of places in my data – should I have tables defining all of these values?
Should every column in all of your tables, then, be a foreign key relation to some table that defines everything? Of course not.
It gets confusing because every database is different, depending on what it is modeling. When trying to determine how to handle the "area code situation", I have found that asking the following question is useful:
Is it an entity or an attribute?
The answer to that is the key to developing your model.
Is the piece of information that you are storing simply an attribute of an entity, or is it an entity itself? A entity should be defined as a table, but an attribute should simply be a column of a table.
Let’s look two scenarios, using area codes:
Scenario #1: In an accounting database, we are keeping track of Vendors. For each vendor, we store contact information such as their mailing address and phone number.
In the above scenario, our accounting database doesn’t need to formally define a concept of “Area Codes”. There’s no reporting of area codes, no tables storing distances or town names or billing rates per area code. It is simply an attribute of a phone number, nothing more. Thus, an area code is simply an attribute, not an entity, and it doesn’t make any sense to have an Area Codes table.
Scenario #2: In a phone billing database, we track all phone calls made, the cost rates, whether or not it is long distance, and so on. We want to return the city of each phone call in the reports, and provide usage stats for different cities and area codes.
Here, area code is important – it is an entity. We might store tables with rates from area code to the next, or a list of towns per area code, and so on. Any phone number stored in this system should have a strong relation to the area code table for reporting and data integrity. Thus, we would certainly create an Area Codes table in this database.
Cities and States
Cities and States are other examples. We often want a State to be a formal entity because it is useful to reference them by either the full name or abbreviation, we’d like to provide lists of states in drop-down lists, there's a short, clearly defined list of them, and so on. Thus, it usually makes sense to have a table of States and to formally define them as entities, not attributes, in your data.
Cities, however, are much like area codes – it depends what you are modeling and if the concept of a “City” entity is important to your data. Often, it is just a simple attribute. But if you are modeling city populations and mayors and locations and distances and what state they belong in, then a City is a formal entity that you are tracking and relating, and it belongs in a formal City table in your database.
Even dates can work the same way. If we just want to store transaction dates, or birth dates or effective dates and things like that, it is just an attribute. But if you have accounting periods or if you need to store the first day of the month to represent that entire month as a whole, then you have some meaning and you’ll need to decide where it makes sense to set up entity tables that defines what these dates are.
For example, suppose we have a Budget table that stores one row per month per project, and the month is stored as a datetime value, with the convention that it is always the first day of the month:
create table Budget (ProjectID int, Month datetime, Amount currency)
You could add a CHECK constraint that ensures that only the first day of a month is ever used, but I would argue that instead you should have a foreign key relation to table of Months:
create table Months (Month datetime primary key, MonthNumber int, Year int, ….)
The reason is that the concept of a Month is important to your data; that datetime value in the budget table isn’t just a random datetime value, it represents a Month entity. That month might have cut-off dates or a "closed?" status, or other important attributes; it is also useful to have a Months table defined for drop down lists and reporting purposes (i.e., to use in OUTER or CROSS JOINs). Months also might be useful for other relations for other entities in your data, such as accounting dates, billing periods, and so on. Thus, in this case, I feel it makes sense to formally define Month entities to keep your data clean and consistent.
Data modeling is an art, not a science, and there are no hard and fast rules that always apply. (Well, except for normalization
) In general, however, I find the "attribute or entity?" check to be a helpful guideline to follow when developing your data models.