Let's say you have a database that contains Companies, Products and Stores. Products and Stores are unique to each company -- i.e., they are not "shared" across companies.
create table Companies
(
CompanyID int identity primary key,
Name varchar(100)
)
create table Products
(
ProductID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100)
)
create table Stores
(
StoreID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100)
)
OK, looks very simple and standard, right? Now, suppose we wish to create a table that lets us relate products to the stores that sell them:
create table StoreProducts
(
StoreID int references Stores(StoreID),
ProductID int references Products(ProductID),
primary key (StoreID, ProductID)
)
All is well and good in the world -- except that we now have a very flawed data model that does not constrain at all against entering potentially
meaningless data!
The problem?
We can easily relate Stores for company X with products for Company Y. None of our constraints disallow this from happening. Yet, it is a simple data model and this is a simple situation and it was very easy to overlook. If we have any data of that sort in our database, it is effectively garbage and we should not be allowing it.
So, the question is: what is the most effective way to handle this? How we can ensure that only Stores and Products for the same company get related?
This seems to be a case where composite primary keys might be required. Using composite primary keys for the Products and Stores table that includes the CompanyID column as well (making those tables more like "child" tables of Companies), we can obtain our goal like this:
create table Products
(
CompanyID int references Companies(CompanyID) not null,
ProductID int identity,
Name varchar(100),
primary key (CompanyID, ProductID)
)
create table Stores
(
CompanyID int references Companies(CompanyID) not null,
StoreID int identity,
Name varchar(100),
primary key (CompanyID, StoreID)
)
create table StoreProducts
(
CompanyID int,
StoreID int,
ProductID int,
foreign key (CompanyID, StoreID) references Stores (CompanyID, StoreID),
foreign key (CompanyID, ProductID) references Products (CompanyID, ProductID),
primary key (CompanyID, StoreID, ProductID)
)
In the above model, we have solved the problem -- only stores and products for the same company can be related. Yet, this doesn't seem very intuitive and doesn't seem ideal.
We could also leave the PK of Stores and Products as the single identity columns and add additional unique constraints to these tables to use in the foreign key constraints, but would the design still be normalized? The CompanyID column in our StoreProducts table would be a redundant attribute that is already stored in other related tables. Storing redundant data as part of a primary key certainly doesn't violate the principle of normalization since that is how we relate two tables, but storing redundant non-PK attributes certainly seems to. Then again, the unique constraint would simply be an alternate primary key so I suppose that would be fine.
The "unique constraint" version of the model looks like this:
create table Products
(
ProductID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100),
constraint Products_altpk unique (CompanyID, ProductID)
)
create table Stores
(
StoreID int identity primary key,
CompanyID int references Companies(CompanyID) not null,
Name varchar(100),
constraint stores_altpk unique (CompanyID, StoreID)
)
create table StoreProducts
(
CompanyID int,
StoreID int,
ProductID int,
foreign key (StoreID, CompanyID) references Stores (StoreID, CompanyID),
foreign key (ProductID, CompanyID) references Products (ProductID, CompanyID),
primary key (StoreID, ProductID)
)
So, be aware of situations like this when designing your data models -- things are not always as simple as they seem. The two techniques shown above will help you to avoid making the modeling error described here and will keep your data accurate and consistent, without the need for triggers or relying on application code to validate data. It seems funny to take a perfectly unique identity and make that
part of a primary key or a unique constraint, since it is unique on its own, but as you can see here it is often a very useful technique.
When choosing a unique constraint or composite primary key, I think the main question is this: Do all (or most) of the the tables related to that entity need both columns for other relations and/or constraints? If so, then use a composite primary key, since you will always be relating on both of those columns anyway. However, if it is rare that you would need to relate back to those entities on both columns, then I would go with a unique constraint.
Does anyone have other (better) ideas out there for handling situations like this? This is a great example of how data modeling is truly an art and not a science.
see also: