Interesting Database Modeling Dilemma
Let's say you have a database that contains Companies, Products and Stores. Products and Stores are unique to each company – i.e., they are not "shared" across companies.
create table Companies
(
CompanyID int identity primary key,
Name varchar(100)
)
create table Products
(
ProductID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100)
)
create table Stores
(
StoreID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100)
)
(
CompanyID int identity primary key,
Name varchar(100)
)
create table Products
(
ProductID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100)
)
create table Stores
(
StoreID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100)
)
OK, looks very simple and standard, right? Now, suppose we wish to create a table that lets us relate products to the stores that sell them:
create table StoreProducts
(
StoreID int references Stores(StoreID),
ProductID int references Products(ProductID),
primary key (StoreID, ProductID)
)
(
StoreID int references Stores(StoreID),
ProductID int references Products(ProductID),
primary key (StoreID, ProductID)
)
All is well and good in the world -- except that we now have a very flawed data model that does not constrain at all against entering potentially meaningless data!
The problem? We can easily relate Stores for company X with products for Company Y. None of our constraints disallow this from happening. Yet, it is a simple data model and this is a simple situation and it was very easy to overlook. If we have any data of that sort in our database, it is effectively garbage and we should not be allowing it.
So, the question is: what is the most effective way to handle this? How we can ensure that only Stores and Products for the same company get related?
This seems to be a case where composite primary keys might be required. Using composite primary keys for the Products and Stores table that includes the CompanyID column as well (making those tables more like "child" tables of Companies), we can obtain our goal like this:
create table Products
(
CompanyID int references Companies(CompanyID) not null,
ProductID int identity,
Name varchar(100),
primary key (CompanyID, ProductID)
)
create table Stores
(
CompanyID int references Companies(CompanyID) not null,
StoreID int identity,
Name varchar(100),
primary key (CompanyID, StoreID)
)
create table StoreProducts
(
CompanyID int,
StoreID int,
ProductID int,
foreign key (CompanyID, StoreID) references Stores (CompanyID, StoreID),
foreign key (CompanyID, ProductID) references Products (CompanyID, ProductID),
primary key (CompanyID, StoreID, ProductID)
)
(
CompanyID int references Companies(CompanyID) not null,
ProductID int identity,
Name varchar(100),
primary key (CompanyID, ProductID)
)
create table Stores
(
CompanyID int references Companies(CompanyID) not null,
StoreID int identity,
Name varchar(100),
primary key (CompanyID, StoreID)
)
create table StoreProducts
(
CompanyID int,
StoreID int,
ProductID int,
foreign key (CompanyID, StoreID) references Stores (CompanyID, StoreID),
foreign key (CompanyID, ProductID) references Products (CompanyID, ProductID),
primary key (CompanyID, StoreID, ProductID)
)
In the above model, we have solved the problem -- only stores and products for the same company can be related. Yet, this doesn't seem very intuitive and doesn't seem ideal.
We could also leave the PK of Stores and Products as the single identity columns and add additional unique constraints to these tables to use in the foreign key constraints, but would the design still be normalized? The CompanyID column in our StoreProducts table would be a redundant attribute that is already stored in other related tables. Storing redundant data as part of a primary key certainly doesn't violate the principle of normalization since that is how we relate two tables, but storing redundant non-PK attributes certainly seems to. Then again, the unique constraint would simply be an alternate primary key so I suppose that would be fine.
The "unique constraint" version of the model looks like this:
create table Products
(
ProductID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100),
constraint Products_altpk unique (CompanyID, ProductID)
)
create table Stores
(
StoreID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100),
constraint stores_altpk unique (CompanyID, StoreID)
)
create table StoreProducts
(
CompanyID int,
StoreID int,
ProductID int,
foreign key (StoreID, CompanyID) references Stores (StoreID, CompanyID),
foreign key (ProductID, CompanyID) references Products (ProductID, CompanyID),
primary key (StoreID, ProductID)
)
(
ProductID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100),
constraint Products_altpk unique (CompanyID, ProductID)
)
create table Stores
(
StoreID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100),
constraint stores_altpk unique (CompanyID, StoreID)
)
create table StoreProducts
(
CompanyID int,
StoreID int,
ProductID int,
foreign key (StoreID, CompanyID) references Stores (StoreID, CompanyID),
foreign key (ProductID, CompanyID) references Products (ProductID, CompanyID),
primary key (StoreID, ProductID)
)
So, be aware of situations like this when designing your data models -- things are not always as simple as they seem. The two techniques shown above will help you to avoid making the modeling error described here and will keep your data accurate and consistent, without the need for triggers or relying on application code to validate data. It seems funny to take a perfectly unique identity and make that part of a primary key or a unique constraint, since it is unique on its own, but as you can see here it is often a very useful technique.
When choosing a unique constraint or composite primary key, I think the main question is this: Do all (or most) of the the tables related to that entity need both columns for other relations and/or constraints? If so, then use a composite primary key, since you will always be relating on both of those columns anyway. However, if it is rare that you would need to relate back to those entities on both columns, then I would go with a unique constraint.
Does anyone have other (better) ideas out there for handling situations like this? This is a great example of how data modeling is truly an art and not a science.
see also:
- Composite Primary Keys
- Data Types -- The Easiest Part of Database Design
- SQL Data Modeling: Entities versus Attributes
- Don't Let Output Dictate your Database Design
- Data belongs in your tables -- not in your code
- The problem isn't the poor database model; It's that external code is bound to the schema
- Dear DBA ....
Legacy Comments
byrmol
2008-01-07 |
re: Interesting, Yet Simple, Modeling Dilemma? Hi Jeff, How about using the original schema and a view constraint on the StoreProducts table... create view StoreProducts_CONSTRAINT as Select StoreID, ProductID from dbo.StoreProducts SP where exists ( Select 1 from dbo.Products P inner join dbo.Stores S on S.CompanyID = P.CompanyID where P.ProductID = SP.ProductID and S.StoreID = SP.StoreID ) with CHECK OPTION go |
Jeff Smith
2008-01-08 |
re: Interesting, Yet Simple, Modeling Dilemma? Hi David -- That would work, but only if we add/remove rows via the View, correct? I think we would be lacking RI when updating the Stores and Products tables; for example, we would not be able to cascade deleted rows and it would not stop us from changing companies of a store/product that has already been related in the StoreProducts table. |
bignose
2008-01-08 |
Error in code Both the example solutions currently (2008-01-09) have the following error: 'foreign key (ProductID, CompanyID) references Stores (ProductID, CompanyID)' This should reference the Products table instead. |
Jeff Smith
2008-01-08 |
re: Interesting Modeling Dilemma Thank you! Fixed. |
bignose
2008-01-09 |
Key and Primary Key What you've identified in the second solution is just another key. It's not an "alternate primary key". The primary key for a table is just the key that has been chosen from among all the possible keys for that table to be "the" key. So, all the other possible keys aren't "alternate primary keys", they're just "keys". It would be less confusing if you spoke of the (CompanyID, ProductID) key as "an alternate key" instead (and likewise for the Products table). |
David
2008-01-11 |
re: Interesting Database Modeling Dilemma Not sure the StoreProducts table is necessary. If you have defined Products(Company) table and the Stores(Company) table then you have already implicitly defined the StoreProducts data, which should be a view that combines Products and Stores tables. If you want to disable certain products in certain stores though, you will need another table along the lines of the last example. |
Niall Litchfield
2008-01-11 |
re: Interesting Database Modeling Dilemma Hi You'll see from my website that I'm really an Oracle guy, but I think there are a couple of potential approaches here. The first is to ask what you are modelling here and if the use of the surrogate primary key is getting you into trouble. I'd argue that there could be a perfectly natural primary key for your products table and that is (company name,product name) - i.e. The unique and mandatory identifier for a product is both the name of the product and the name of the supplier. If you go down this route you'd "naturally", sorry, be using the full key in the storeproducts table. The other approach open to me in Oracle - and surely in MSSQL as well - would be to add a trigger to the store products table that checks that the incoming data meets the constraint. So in summary I think that I am saying that I think that surrogate keys are helping cloud the issue and that potentially the constraint could be implemented in a trigger/sproc. Incidentally, but maybe not of immediate use to many of your readers, Toon Koppelars gave a presentation on constraint design to the UK Oracle Users Group recently http://conference.ukoug.org/default.asp?p=685&dlgact=shwprs&prs_prsid=1424&day_dayid=9 but you'll need to be a member to download it, in which he argues that in some cases integrity constraints have to be implemented in triggers since none of the available RDBMS on the market actually implement all of the relational model (and so can't do all appropriate constraint checking because of the implementation of the RDBMS). |
Jeff
2008-01-11 |
re: Interesting Database Modeling Dilemma David -- it is necessary, as explained in the post. Not all products are sold at all stores. |
Jeff
2008-01-12 |
re: Interesting Database Modeling Dilemma Niall -- >>The first is to ask what you are modelling here and if the use of the surrogate primary key is getting you into trouble. I'd argue that there could be a perfectly natural primary key for your products table and that is (company name,product name) - i.e. The unique and mandatory identifier for a product is both the name of the product and the name of the supplier. If you go down this route you'd "naturally", sorry, be using the full key in the storeproducts table. Great point! In fact, that is what my next post is going to be about. This post is a bit of an implied argument against exclusively using surrogate keys (as many people claim is the only way to go) but an upcoming post of mine will take this example a little further and compare the differences between modeling everything with identities versus modeling with natural keys (i.e., a composite key of CompanyCode/ProductCode.) The problem with identities really becomes apparent when trying to implement fk constraints such as these, when things are dependent on more than one attribute. I prefer not to use triggers or even CHECK constraints when RI will do instead; triggers and check constraints are only run on the table being modified, not the others, and you do not get the benefit of cascading updates and deletes. Thanks for your feedback! Even though you *are* an Oracle user! Yuck! :) |
Troy Ketsdever
2008-01-14 |
re: Interesting Database Modeling Dilemma Jeff; Good job highlighting this design problem. I have a series of articles on db design in which I tackle a similar problem (implementation using T-SQL). Article 3 has a solution similar to what you've posted: http://www.sqlservercentral.com/articles/Data+Modeling/61528/ Article 4 shows another way of looking at the problem (SQL Server 2005 implementation): http://www.sqlservercentral.com/articles/Data+Modeling/61529/ TroyK |
Hugo Kornelis
2008-01-26 |
re: Interesting Database Modeling Dilemma Jeff, I fail to see why this is a modeling dilemma. I only see that your first design is bad, your second is slightly better, and the third is bad again. Your first design of the Products table (and other tables as well) doesn't even have a real key, only the surrogate key made by an IDENTITY column. There is no protection at all against a data typist introducing a duplicate by accidentally hitting the <Enter> button twice. I'm sure that the people in the company already have some way of identifying products. *THAT* is the real primary key, together with CompanyID (since another company might have used the same key value for a different product). Similar in the Stores table. Which makes the design for the linking table obvious. I'm always amazed at how many apparent design "problems" are just a result of mindlessly slapping an IDENTITY primary key column onto each and every table. Just make a good logical design first, with no surrogate key at all, and most of your problems will be solved. Deciding which tables would benefit from ADDING a surrogate key (and replacing foreign keys pointing to the table with that surrogate key) should be one of the last steps in the design cycle. Unfortunately, many people nowadays start with the surrogate keys, often even as the only keys, and spend the rest of the design (and much of the coding) cycle fighting the side effects of a not-so-good-thought-out decision. |
Jeff
2008-01-26 |
re: Interesting Database Modeling Dilemma Hi Hugo -- Thanks for the feedback. Your post sounds a little like you are addressing me a bit as if I am a pro-"all identity pk all the time" kind of guy, which I absolutely am not. In fact, this post subtly describes why that is not always a good idea. This is a good, simple example. My goal , when I have more time, is to expand on this. I assume you missed my response to Niall a few comments back? One thing -- assigning entities identity PK's in not necessarily a bad thing, and in fact it can often be very, very useful. There is not always a quick and easy key derived from actual data that works well in the physical world. However, it is when we have entities that are child entities of others, or that have dependencies on others, or that relate others, that it is clear that a natural composite key involving other (potentially surrogate) keys is a much better design solution. I discuss this quite a bit here: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx It's funny to read the comments from that post -- half of them are saying "this is so basic and obvious, this must be written for newbies!" and the other half are saying "you are wrong, everyone knows you always use identities on all tables!". Very interesting contrast indeed! In short, stay tuned. Much more on this topic to come. Thanks again for your input. |
Hugo Kornelis
2008-01-26 |
re: Interesting Database Modeling Dilemma Hi Jeff, You read my post right - and I apparently misunderstood your blog post. I'm afraid that your subtle anti-identity approach turned out a bit too subtle here! And indeed, I did miss your response to Niall. After reading your article on composite primary keys (thanks for the pointer - I didn't have time to go through all the comments at the moment, hopefully later), I am now convinced that you and I will probably agree at least 90% when it comes to discussing what keys to use in a given situation. I always enjoy reading your blog, and now that I know that you plan to cover the identity / natural key debate some more in the future, I can hardly wait to see what you have planned. Best, Hugo |