SQL Data Modeling: Entities versus Attributes
Introduction
<o:p>There’s a handy little rule of thumb I use when developing my database models. It comes up often in situations like this:
It gets confusing because every database is different, depending on what it is modeling. When trying to determine how to handle the "area code situation", I have found that asking the following question is useful:
Area Codes
Scenario #1: In an accounting database, we are keeping track of Vendors. For each vendor, we store contact information such as their mailing address and phone number.
In the above scenario, our accounting database doesn’t need to formally define a concept of “Area Codes”. There’s no reporting of area codes, no tables storing distances or town names or billing rates per area code. It is simply an attribute of a phone number, nothing more. Thus, an area code is simply an attribute, not an entity, and it doesn’t make any sense to have an Area Codes table.
Cities and States
Cities, however, are much like area codes – it depends what you are modeling and if the concept of a “City” entity is important to your data. Often, it is just a simple attribute. But if you are modeling city populations and mayors and locations and distances and what state they belong in, then a City is a formal entity that you are tracking and relating, and it belongs in a formal City table in your database.
Dates
create table Budget (ProjectID int, Month datetime, Amount currency)
create table Months (Month datetime primary key, MonthNumber int, Year int, ….)
Data modeling is an art, not a science, and there are no hard and fast rules that always apply. (Well, except for normalization) In general, however, I find the "attribute or entity?" check to be a helpful guideline to follow when developing your data models.
see also:
- The problem isn't the poor database model; It's that external code is bound to the schema
- Distinguishing data from code
- Data Types -- The Easiest Part of Database Design
- SQL Data Modeling: Entities versus Attributes
- Don't Let Output Dictate your Database Design
- Custom Auto-Generated Sequences in SQL Server
- Data belongs in your tables -- not in your code
- Delete Duplicates And Resolve Foreign Key References in SQL
Legacy Comments
Travis Laborde
2007-06-20 |
re: SQL Data Modeling: Entities versus Attributes Interesting post, thanks! But what about the more basic yet still arguable question "does the area code deserve it's own column or should the whole phone number be in one column, including the area code?" I'd say the same guidelines apply - if you depend on the Area Code for anything then make it a column by itself. And, if you store the whole phone number together, should it have the () and the - stuff, or be just 10 digits? Obviously this only deals with US phone numbers. Still, a discussion of these topics would make the post even better :) |
Andrew Beaupre
2007-06-21 |
re: SQL Data Modeling: Entities versus Attributes In regards to phone numbers, I feel it's best to strip out all formatting whatsoever. Data consistency is important. One can then have a separate column which holds the formatting in the way which the user inputted the data. This gives you the ability to format it in a standard way if you like, or rely on the user for the correct formatting (which varies from country to country). Area codes are a North American thing. While Europe does have common prefixes (02 - UK landlines vs 07 - UK mobiles), these aren't regarded in the same way as North American area codes. If it's necessary to maintain international compatibility, a dedicated area code field wouldn't seem to apply. You could always extrapolate the area code from the phone number, provided the data was in a consistent format. But this is all part of the article's topic - design your schema to fit the circumstances. All too often I see people go normalization crazy which, in the end, harms performance and over complicates things. |
Daniel
2010-01-15 |
A similar thinking I often think of it this way: "Will it depend on other data, or will other data depend on it?" It's the same thing as you said, but entity and attribute are kind of abstract notations, and this question already contains the explanation of those 2 terms. But that's just me... :) If it depends on other data -> attribute, else it's an entity. Excellent blog btw. |