Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Composite Primary Keys

Ah … primary keys … such a topic!  When discussing what columns to define as a primary key in your data models, two large points always tend to surface:
  1. Surrogate Keys versus Natural Keys
  2. Normalization
These can be very complicated and sometimes polarizing things to debate.   As I often try to do, I will attempt to approach this topic from a slightly different perspective.

Let's start things off with what I feel is a good interview question:

How would you define what a primary key of a table is?

a.    An auto-generated numeric or GUID column in the table that uniquely identifies each row
b.    A non-nullable column in the table that uniquely identifies each row
c.    None of the above

I suspect that many people will answer (a), and quite a few will answer (b).  If you answer (c), though, you are correct!  Why?  Because a primary key is not a single column, it is a set of columns.  Many people who have designed large, complicated systems are simply not aware of this.

I once worked with a consultant who kept claiming that importing data into his system was complicated, because his database used “primary keys”.  It was very confusing (yet humorous) trying to discuss things with him because he kept confusing primary keys with identity columns.  They are not the same!  An identity column may be a type of primary key, but a primary key is not an identity column; it is a set of columns that you define that determine what makes the data in your table unique.  It defines your data. It may be an identity column, it may be a varchar column or a datetime column or an integer column, or it may be a combination of multiple columns.

When you define more than one column as your primary key on a table, it is called a composite primary key.  And many experienced and otherwise talented database programmers have never used them and may not even be aware of them. Yet, composite primary keys are very important when designing a good, solid data model with integrity.

This will be greatly oversimplifying things, but for this discussion let's categorize the tables in a database into these two types:
  • Tables that define entities
  • Tables that relate entities
Tables that define entities are tables that define customers, or sales people, or even sales transactions.  The primary key of these tables is not what I am here to discuss.   You can use GUID columns, identity columns, long descriptive text columns, or whatever it is you feel comfortable to use as primary keys on tables that define entities.  It’s all fine by me, whatever floats your boat as they say.  There are lots of discussions and ideas about the best way to determine what the best primary key of these tables should be, and pros and cons of all of the various approaches, but overall, that is not really what I am addressing.  

Tables that relate entities, however, are a different story.

Suppose we have a system that tracks customers, and allows you to assign multiple products to multiple customers to indicate what they are eligible to order.  This is called a many-to-many or N:N relation between Customers and Products.  We already have a table of Products, and a table of Customers.  The primary key of the Products table is ProductID, and the Customers table is CustomerID.  Whether or not these “ID” columns are natural or surrogate, identity or GUID, numerical or text or codes, is irrelevant at this point.

What is relevant and important, and what I am here to discuss, is how we define our CustomerProducts table.  This table relates customers to products, so the purpose of the table is to relate two entities that have already been defined in our database.  Let’s also add a simple “OrderLimit” column which indicates how many of that product they are allowed to order.  (This is just a simple example, any attribute will do). How should we define this table?

For some reason, a very common answer is that we simply create a table with 4 columns: One that stores the CustomerID, one that stores the ProductID we are relating it to, the Order Limit, and of course the primary key column which is an identity:

Create table CustomerProducts
(
    Customer_ProductID int identity primary key,
    CustomerID int references Customers(CustomerID) not null,
    ProductID int references Products(ProductID) not null,
    OrderLimit int not null
)

This is what I see in perhaps most of the databases that I’ve worked with over the years.  The reason for designing a table in this manner?  Honestly, I don’t know! I can only surmise that it is because of the lack of understanding what a primary key of a table really is, and that it can be something other than an identity and that it can be comprised of more than just a single column.  As I mentioned, it seems that many database architects are simply not aware of this fact.

So then, what is the problem here?  The primary issue is data integrity.  This table allows me to enter the following data:

CustomerProductID    CustomerID    ProductID    OrderLimit
1                    1             100          25
2                    1             100          30

In the above data, what is the order limit for customerID #1, productID #100?  Is it 25 or 30?  There is no way to conclusively know for sure.  Nothing in the database constrains this table so that we only have exactly one row per CustomerID/ProductID combination. Remember, our primary key is just an identity, which does not constrain anything.  

Most database designs like this just assume (hope?) that the data will be always be OK and there will be no duplicates.  The UI will handle this, of course!  But even if you think that only one single form on one single application ever updates this table, you have to remember that data will always get in and out of your system in different ways.  What happens if you upgrade your system and have to move the data over?  What if you need certain transactions restored from a back up?  What if you ever need to do a batch import to save valuable data entry time?  Or to convert data from a new system that you are absorbing or integrating?

If you ever write a report or an application off of a system and simply assume that the data will be constrained a certain way, but the database itself does not guarantee that, you are either a) greatly over-engineering what should be a simple SQL statement to deal with the possibility of bad data or b) ignoring the possibility of bad data completely and setting yourself up for issues down the road.  It's possible to constrain data properly, it's efficient, it's easy to do, and it simply must be done or you should not really be working with a database in the first place -- you are forgoing a very important advantage it provides.

So, to handle that issue with this table design, we need create a unique constraint on our CustomerID/ProductID columns:

create unique index cust_products_unique on CustomerProducts (CustomerID, ProductID)

Now, we are guaranteed that there will only be exactly one row per combination of CustomerID and ProductID.  That handles that problem, our data now has integrity, so we seem to be all set, right?

Well, let’s remember the definition of what a primary key really is.  It is the set of columns in a table that uniquely identify each row of data.  Also, for a table to be normalized, all non-primary key columns in a table should be fully dependent on the primary key of that table.

Consider instead the following design:

Create table CustomerProducts
(
    CustomerID int references Customers(CustomerID) not null,
    ProductID int references Products(ProductID) not null,
    OrderLimit int not null,
    Primary key (CustomerID, ProductID)
)

Notice here that we have eliminated the identity column, and have instead defined a composite (multi-column) primary key as the combination of the CustomerID and ProductID columns.  Therefore, we do not have to create an additional unique constraint.  We also do not need an additional identity column that really serves no purpose.  We have not only simplified our data model physically, but we’ve also made it more logically sound and the primary key of this table accurately explains what it is this table is modeling – the relationship of a CustomerID to a ProductID.

Going back to normalization, we also know that our OrderLimit column should be dependent on our primary key columns.  Logically, our OrderLimit is determined based on the combination of a CustomerID and a ProductID, so physically this table design makes sense and is fully normalized.  If our primary key is just a meaningless auto-generated identity column, it doesn’t make logical sense since our OrderLimit is not dependent on that.

Some people argue that having more than one column in a primary key “complicates things” or “makes things less efficient” rather than always using identity columns.  This is simply not the case.   We’ve already established that you must add additional unique constraints to your data to have integrity, so instead of just:
  1. A single indexed composite primary key that uniquely constrains our data
we instead need:
  1. An additional identity column
  2. A primary key index on that identity column
  3. An additional unique constraint on the columns that logically define the data
So we are actually adding complexity and overhead to our design, not simplifying!  And we are requiring more memory and resources to store and manipulate data in our table.

In addition, let's remember that a data model can be a complicated thing.  We have all kinds of tables that have primary keys defined that let us identify what they are modeling, and we have relations and constraints and data types and the rest.  Ideally, you should be able to look at a table's primary key and understand what it is all about, and how it relates to other tables, and not need to basically ignore the primary key of a table and instead investigate unique constraints on that table to really determine what is going on!  It simply makes no sense and adds unnecessary confusion and complication to your schema that is so easily avoided.

Some people will claim that being able to quickly label and identify the relation of a Product to a Customer with a single integer value makes things easier, but again we are over-complicating things.  If we only know we are editing CustomerProductID #452 in our user interface, what does that tell us?  Nothing!  We need to select from the CustomerProducts table every time just to get the CustomerID and the ProductID that we are dealing with in order to display labels or descriptions or to get any related data from those tables.  If, instead, we know that we are editing CustomerID #1 and productID #6 because we are using a true, natural primary key of our table, we don’t need to select from that table at all to get those two very important attributes.

There are lots of complexities and many ways to model things, and there are many complicated situations that I did not discuss here.  I am really only scratching the surface.  But my overall point is to at least be aware of composite primary keys, and the fact that a primary key is not always a single auto-generated column.   There are pros and cons to many different approaches, from both a logical design and physical performance perspective, but please consider carefully the idea of making your primary keys count for something and don’t automatically assume that just tacking on identity columns to all of your tables will give you the best possible database design.

And, remember -- when it comes to defining your entities, I understand that using an identity or GUID or whatever you like instead of real-world data has advantages.  It is when we relate entities that we should consider using those existing primary key columns from our entity tables (however you had defined them) to construct an intelligent and logical and accurate primary key for our entity relation table to avoid the need to create extra, additional identity columns and unique constraints.

see also:

Print | posted on Thursday, August 23, 2007 10:07 AM | Filed Under [ Techniques Database Design ]

Feedback

Gravatar

# re: Composite Primary Keys

Guess you deal with lots of noobees. This is even below basic understanding of relational design.
8/23/2007 3:14 PM | Baba O'Riley
Gravatar

# re: Composite Primary Keys

Here's a non-trivial counter argument. I tend to agree with your perspective more, but I'd be very interested to read a specific response to this:

http://rapidapplicationdevelopment.blogspot.com/2007/08/in-case-youre-new-to-series-ive.html
8/23/2007 3:45 PM | Seth
Gravatar

# re: Composite Primary Keys

Seth -- Actually, that article discusses defining the primary key for entities, which isn't what I am discussing in my article.

Primary keys on entity tables can be tricky, and it really depends on many, many factors. I recognize that identity columns make great keys, and can be useful, but they should not be used exclusively.

Another good examples of not using an identity column is a "history" table that tracks changes in attribute(s) for an entity over time. Again, to have good data integrity, you need to ensure that only one row per entityID/datetime exists in the table, so I would argue just use a composite PK of those two columns, not an additional, meaningless identity column.

Overall, it is a complex topic, which is why I tried my best to keep my article focused on one particular situation.
8/23/2007 4:15 PM | Jeff
Gravatar

# re: Composite Primary Keys

Baba -- unfortunately, there are many, many experienced database professionals who exclusively add identity/autonumber primary keys to *all* of their tables, without question or thought. I've seen it over and over. You are very lucky to be able to avoid these designs!
8/23/2007 4:19 PM | Jeff
Gravatar

# re: Composite Primary Keys

I agree with the practice of using composite primary keys and have normally followed this approach. However I have recently been working with the MySQL database using InnoDB type tables. This table type automatically clusters the primary index which would mean a large overhead when new product is added to a customer in your CustomerProducts example. To work with this constraint I have would define the primary index as CustomerProductID and add an alternate key (unique index) on CustomerID and ProductID.
8/23/2007 6:40 PM | Paul Turner
Gravatar

# re: Composite Primary Keys

I tend to argue that all tables should have auto-generated PKs. In your example, it seems like there is no harm by using the customer/product as the primary key. But what if it changes? Say you later decide that you need additional criteria. Maybe the order limit has a chronological component, or a change is made to have different versions of a product underneath a single product id. Now your PK has to change, which is a royal pain in the ass and should never happen. With an auto generated key, you don't have this problem.

8/23/2007 8:39 PM | Mike H
Gravatar

# re: Composite Primary Keys

If you're coding in any sort of language above the sql layer, then having the same kind of ID (autoincremented integer) with the same name (ID) on all tables makes it possible and easy to abstract a lot of database functionality and reduces code complexity immensely. This, IMHO, is far more valuable than "being prepared for" the eventualities you mention in the article.
8/23/2007 11:04 PM | Truy
Gravatar

# re: Composite Primary Keys

I've had this sketch before with colleagues, most of us go for the composite approach.

A composite key is often used for joining and searching, one index can often cover both cases for most queries
A single column PK with an FK on another column will require 2 indexes for searching with either a covering index (wider = bigger) or a bookmark lookup (= expensive)
You extend this to a 3 or 4 level hierarchy and it becomes messy.


8/24/2007 1:48 AM | gbn
Gravatar

# re: Composite Primary Keys

You're argument doesn't work because you're ignoring that programs consist of much more than just schema. You approach works great if you enjoy hand writing all your sql, it's lousy when you want a middleware layer to automate CRUD. People who say the single incrementing key is simpler aren't referring to simpler schemas, they're referring to the vastly simpler programs that result from fully automating the mapping of objects into tables.

It's much simpler to enforce uniqueness with a unique constraint and have the extra key field to allow "simple" automated mapping of objects into the db. Sql is great for random queries, it sucks balls for insert/update/delete of entities that can be fully automated with little more than allowing incrementing keys in your tables.

Relationship tables also tend to grow attributes of their own and suddenly become entity tables in their own right, at which point it becomes awkward if you have a multi-column key, better to just start with a key that works for all scenarios.

Also, I think most people are fully aware of composite keys, they just consider them evil because of the complexities they introduce into joins, foreign key references, and bloated program code or mappings from what essentially become special case keys for each purpose. Automation works much better when patterns are regular and predictable. Customer table -> customerId, CustomerProduct table -> customerProductId (plus unique constaint), vastly simpler to automate.

Ruby on Rails is the prime example of how much simpler programming becomes by following simple rules like this that violate relational theory but totally make up for it by optimizing the most expensive process in all of development, the programmers time.
8/24/2007 2:22 AM | Ramon Leon
Gravatar

# re: Composite Primary Keys

Ramon definitely has a point here. I'm not an expert, but I know enough that you would lose so much of the automation which is making Ruby on Rails so wonderful (object-relation-mapping) if you won't stick to the rule of creating a auto generated numerical id column for every table in the database. I don't think that the advantages Jeff mentions in the article could ever compensate that.
When looking from the perspective of the pure relational database theory, Jeff's probably right, but not from a pragmatic programmers point of view.
8/24/2007 6:07 AM | schallstrom
Gravatar

# re: Composite Primary Keys

Hello...

I have to agree with most of the former posters....

I know that composite keys exist, but they are "evil"... Yes i know the example you gave shows one of the few cases where such a key -would- be valid, but the rule "Relations = Composite // Entities = Identity" is a very dangerous one. When there will be a change to the relation, you will be in a world of hurt. Building a key like this forces you to modify a lot of existing code when you are going to add a field to the relation. So you would violate the open/close priciple.

The benefit you gain from using only a composite key for this table is something that is not worth the trouble it will cause. And if we run into one of the -rare- situations, where this (composite) setup is worth it, then you have STILL the option to create the composite setup. But creating a composite key table, just because it would be faster is something i would consider a "premature optimization".

If the only place you can tweak you DB is changing your DB so it uses a composite key over a composite index, then i clearly envy you. ;)



8/24/2007 8:21 AM | Heiko Hatzfeld
Gravatar

# re: Composite Primary Keys

Dear Ramon, schallstrom,
You've just highlighted the problem with RoR and similar things such as Hibernate. They create god-awful databases. And then, should your "site" have more than 5 users the database doesn't scale well. The database - in the experiences I've had with RoR developers - don't scale.

RoR may one day be a good language and may overcome its many short comings, but it causes way too many problems.

But back to the point, you guys shouldn't be allowed near computers let alone databases :( Go take some basic courses on computers, their architecture and how things work.

8/24/2007 9:07 AM | Muppets In IT O.U.T
Gravatar

# re: Composite Primary Keys

It's funny that some people are reading this post as advice on doing this as a "performance tweak".

Performance is not a factor in this. The goal is a good, solid, efficient schema that correctly physically models what you are logically representing. The way to do this is with composite keys when appropriate, since it correctly and logically and accurate reflects what you are modeling.

Now, if your logical model *changes*, then your physical schema should change also -- along with data access layers and applications and reports as well; that's a fact of life. To have your logical model change and *not* change any of these things results in an ugly mess where you are not correctly modeling and representing your data. End of story.

This is not the same as designing a flexible schema that anticipates and allows changes going forward in your data; if you plan ahead, and design your tables well, you can still accommodate major changes easily *and* have the benefit of a good design.

The argument of "always use surrogate keys no matter what in case your schema needs to change" simply does not hold water. If you want to throw together a loose, "approximate" schema that "sort of" models your data, then fine, your argument holds -- always use surrogate keys and just tweak your columns here and there to "sort of" approximate your logical model in your database. That's absolutely your right to design your systems that way. I will suggest, however, looking into designing an *accurate* and efficient schema for your data; the benefits a good data model are huge over a loose approximate design, in terms of maintainability, usability, performance, data integrity, clear documentation, and so on.

The way I see it, there's 3 ways to look at it:

option #1: design a "loose" schema, all surrogate, little RI or constraints, just plug new columns or tables to the database randomly over time with minimal changes

option #2: design a "tight" schema that models what you need, but without planing ahead for growth or changes

option #3: design a "tight" schema that models what you need and has room for growth or changes


option #2 is limited and can cause maintenance issues, even if the initial design is technically "perfect". If you didn't plan ahead and understand the requirements well enough to anticipate future growth, then that can be an issue. This is never the approach I recommend and certainly not what I am advocating here.

option #1 is the easiest in the short term, and many people choose it. it requires almost no planning, no thinking ahead, we keep everything loose and we don't worry too much about accurately modeling what we are storing. If the data model is never accurate in the first place, then we don't need to worry about it being accurate going forward, so we can just keep stuffing columns and tables in there and let it into a big, organic mess. In the long term, this approach can lead to nightmares.

option #3 is the hardest in the short term, the easiest in the long term. If you take time to plan and come up with a good, solid model that can grow and you anticipate future needs intelligently and plan for them, you might be surprised at how well and easily your system accommodates changes.
8/24/2007 9:20 AM | Jeff
Gravatar

# re: Composite Primary Keys

Ramon et al: if your middleware can't handle composite keys, then it is your middleware that is broken, not composite keys.
8/24/2007 9:22 AM | Steve
Gravatar

# re: Composite Primary Keys

Having just read the article I was about to comment on the fact that this was such a basic concept it seemed a waste of cyberspace but having read some of the comments by others, crikey...

"I think most people are fully aware of composite keys, they just consider them evil". Do they?? Who have you been working with. I can't think of any db professional I've worked with who would agree with you and not that many programmers either. While a single column key is often the correct solution the many to many link tables (as in the example above) are crying out for a composite key. It's a complete waste of time and space inserting a single column key for such things. Think about it. Nearly every query you'd be making on the above table would be along the lines of which customer bought this product or what products did this customer buy. So what is the purpose of it?

"Maybe the order limit has a chronological component, or a change is made to have different versions of a product underneath a single product id" Why? Why would this happen? Why would you have a product in a table and then completely change it. "So product id 10 is currently a guitar but I think I'm going to make it a rhinoceros now." I don't understand the logic in that.

As for making it difficult working with hibernates/ror et al, I can't speak for ror but I just worked on a very large project using nhibernate and their are no issues whatsoever with having composite keys in your tables. I'm not the biggest fan of these things but you don't have to compromise good database design to work with them.


8/24/2007 10:48 AM | big d
Gravatar

# re: Composite Primary Keys

there's only one reason i dislike composite keys:
you have to type more than one column in the joins.
8/24/2007 4:42 PM | Mladen
Gravatar

# re: Composite Primary Keys

Mlanden, you're worried about multiple joins? No offense, but you must be new. I usually have many joins regardless of whether composite keys are involved or not.
8/24/2007 9:51 PM | senfo
Gravatar

# re: Composite Primary Keys

If it works then it works.

When you are programming you make thousants of decisions.
This is a grade 1 textbook decision.

I like the idea of always having the same key. You can make your code much less complex with that.
And I agree with the case of the relating tables that an artificial key is overhead.

IMO if it works then it works. Whatever the solution.

There are MANY way more complex decisions that we have to make and that have more effect on complexity and performance if you ask me.

8/25/2007 1:38 AM | Juul
Gravatar

# re: Composite Primary Keys

@senfo:
no i'm not new :)
i just think that joining on more than one column between 2 tables is a pain in the behind.
one to one seems clearer and neater to me.
that doesn't mean i never use composite keys.
8/25/2007 1:40 PM | Mladen
Gravatar

# re: Composite Primary Keys

I could of sworn you or someone else posted this same article just months ago. I cant believe we still have discussions on stuff like this. I think what Mladen is saying is that typing out the inner join with just one field from one table to another is a cleaner method, he did not imply that it is the only method. He's definately not knew as he has been using 2k for quite some time as well as 2k5..prolly dipping his hands in the 2k8 beta.

Either way the article should be summarized to say that the logical correct way is to use the composite key, you don't need complexies by adding auto numbers. Saying too many joins is complex is not true, it basically comes down to being lazy. If you sit there and start adding autonumbers on all of your tables something should click saying why am I adding this level of complexity when my row is unique with these 2 fields FOR ALL CASES.

These are basic issues as stated by Juul...
8/25/2007 2:42 PM | Jon H.
Gravatar

# re: Composite Primary Keys

I am responding to the original article's question: "The reason for designing a table in this manner?" [using composite keys for a relationship type of table] I have a good reason.

Eight years ago I inherited a database that started out with several composite keys. The composite keys caused problems. So over time, I have worked to eliminate them. My programs are much better now. I will explain how.

I appreciated how this article made a distinction between thing/defining tables and relationship tables. I can see how using a composite key for relationship tables makes a lot of sense in a theoretical world. But I do not develop databases for the sake of developing databases. The sole purpose of my databases is to support applications which real people need to use. The whole purpose of my job is to create applications that make those users more efficient and happy. Bad, confusing error messages make people unhappy. This is the main problem with a composite primary key *especially* when dealing with the relationship tables described in this article. Without a single unique value row, I can't think of a way an application can check and warn about record duplicates BEFORE the user is allowed to save data. I will always have a unique index in place to cover the two foreign keys--in case there is a problem with my front-end. However, the error message a database develops is incomprehensible to the user should the user attempt to create a duplicate. I want to catch the problem up-front and help the user navigate the sometimes confusing waters of business rules.

Suppose I have a situation where only one of something is allowed. Suppose for example, each client can only have one address of each type. (one home address, one mailing address, etc). Or as in the example above, suppose each combination of product and customer can only be entered once. My databases would have a unique index to ensure duplicates are not allowed. However, this is not sufficient to creating an acceptable user-friendly application. My front-end applications also check before the user updates a record to make sure the user isn't violating that index and trying to create a duplicate. Sure, the database will catch this. But my code can generate a human-understandable error message, telling the user exactly what the problem is and how to fix it. You almost can't do this pre-check if you don't have some outside way of identifying the rows *other than* the primary keys of the related tables in questions.

(continued)
8/27/2007 4:26 PM | JJ Bienn
Gravatar

# re: Composite Primary Keys

(continued from previous comment)

Think about it. The user is updating a row. How do you know if the row the user is working on is the same row that is in the database or not? Without something like an identity column, you can't know which row the user is working on vs which row is in the database. I'll give an example in case you don't already get the point. Suppose the CustomerProducts table has this data:

CustomerID ProductID OrderLimit

1 100 25

1 200 35

Now, for whatever reason, the dear user is working on the second record/row (lets have one controversy at a time please) listed above and tries to change the ProductID from 200 to 100. Don't laugh, it happens all the time. Now, before saving the record, the front-end application wants to run a query and check to see if there is already a record with CustomerID=1 and ProductID=100. Well, yes, as a human we can see that there is such a row already. But when the query looks at at the table, how does it know if that first row is the row the user is currently editing? We need a way to identify the row.

If each row had a unique number associated with it, then the query could be designed to exclude the row the user is editing. Say the table looked like this:

CustomerProductID CustomerID ProductID OrderLimit

88 1 100 25

99 1 200 35

Now, if the user is working on record 99 and changes the ProductID to 100, the query can look at all rows except record 99 and see if there is already another row with CustomerID=1 and ProductID=100. (I'm sorry I couldn't think of a way to explain that more succinctly. I'm still not sure I made the point.)

So, when people say that there is no legitimate reason to have an identity column, that is not true if you value user-friendly applications. Admittedly, the identity column does not have to be the primary key. But making that identity column the primary key really doesn't hurt: a) it doesn't add that much overhead if you are going to have an identity column anyway, and b) makes the table much easier to work with and perform better if the primary key of the relationship table ends up serving as a foreign key in another table. I know that several people on this forum don't buy this last argument, but that has been my experience. And in the context of my other points, it is not needed in order for my over-all point to be made.
8/27/2007 4:27 PM | JJ Bienn
Gravatar

# re: Composite Primary Keys

JJ -- Thanks for your feedback ... it appears that you are raising two points:

1) it is confusing for users if we rely only on primary key constraints to indicate if the data added or changed is valid or not

and

2) without an identity, there is no way to know for sure "which row" we are editing

(correct me if I am wrong there in my summary)

To address your points:

#1 -- of course we still have a user interface layer and we still check and validate things and return messages and such just as you would expect. Why would you assume that using a composite primary key means that we cannot have a friendly, intelligent UI ?

#2 -- a primary key defines a row. We always know which row we are editing by the primary key. This includes when we use composite primary keys. We don't need a single value to identify a row for us, we can use more than one. Instead of adding an identity to know that we are editing CustomerProductID #99, we simply know that with a composite key we are editing relation CustomerID #200, ProductID #200. It's the same thing. And, in fact, that CustomerProductID #99 tells us absolutely *nothing* about what we are viewing or editing, but the composite PK of Customer ID#200/ProductID#200 tells us *exactly* what we are viewing or editing.

8/27/2007 4:34 PM | Jeff
Gravatar

# re: Composite Primary Keys

I think I also noticed a 3rd point:

>>Without a single unique value row, I can't think of a way an application can check and warn about record duplicates BEFORE the user is allowed to save data. I will always have a unique index in place to cover the two foreign keys--in case there is a problem with my front-end. However, the error message a database develops is incomprehensible to the user should the user attempt to create a duplicate. I want to catch the problem up-front and help the user navigate the sometimes confusing waters of business rules.

I don't understand what you are saying here. Whether you have a identity PK or a composite PK, you use the exact same SQL statement to pre-validate something:

if exists (select * from customerProducts where customerID=@CustID and ProductID=@ProductID) then ....

it's the same sql statement either way ... can you give an example of why it would be any different with or without a composite PK ??
8/27/2007 4:39 PM | Jeff
Gravatar

# re: Composite Primary Keys

Jeff,

I'm sorry I wasn't more clear. Let me try again.

You say that the code knows which row the user is working on because we know the primary key. My point is that the user can change *or not change* the primary key to match the same data/primary key existing in an entirely different row. And you have to know the difference. Suppose the user does change a primary key to match an existing key in the database. You want to do a check before the save. So, the following query example as you proposed makes sense:

if exists (select * from customerProducts where customerID=@CustID and ProductID=@ProductID)
then Do-Not-Allow-Because-Existing-Row-Has-Same-Primary-Key

In the above example, if the row exists, this is a bad thing. So, we don't allow the user to change the data. But what if user is changing other data, not the primary key? Suppose the user is changing the OrderLimit as in your example. In that case, the above Exists question is still true. There is an existing record in the database that has the Customer and Product IDs in question. But in this case, having an existing record for that combination of CustomerID and ProductID is a perfectly legitimate thing. The row the user is changing IS that same row identified by the Exists query. It is not a duplicate row. In this case, the above "THEN" code would be wrong. To handle both situations, the kind of query I write instead would be:

if exists (select * from customerProducts
where customerID=@CustID and ProductID=@ProductID AND
We-Are-Not-Looking-At-Original-Row)
then Do-Not-Allow-Because-Existing-Row-Has-Same-Primary-Key

The easiest way to do this is to have an identity field to work with:
if exists (select * from customerProducts
where customerID=@CustID and ProductID=@ProductID AND
CustomerProductsID<>@CustProdID-Of-Row-User-Is-Editing)
then Do-Not-Allow-Because-Existing-Row-Has-Same-Primary-Key

(continued)
8/27/2007 5:46 PM | JJ from Eugene
Gravatar

# re: Composite Primary Keys

(second part of previous post)

It is true that this could be done another way. You could do the following: a) keep track of whether the user is inserting a row for the first time or editing an existing row. b) if user is editing an existing row, then keep track of original key data compared to what the user changed. Then, you could write queries that exclude rows based on the original key value of the record rather than what the user changed the data to. Or perhaps even better, you could craft code to only check the database for duplicate errors IF the user is changing the primary key fields. My approach has been: do not to worry about which data the user changed. Just make very sure the user is not creating a duplicate key. By having an identity field that can not be changed by the user and which uniquely identifies each row, I can write a single query that works for all situations.

Then again, now that I've written this, I can see how limiting database check to be done only when strictly required would appeal to a lot of people. Maybe keeping track of which fields the user changes would be the better approach. I haven't given it much thought because my current strategy has worked so well for years. I will probably think about this the next time I have a similar situation.

Anyway, did that help explain my point? I am happy to believe I am missing something since I've never seen my issue posted anywhere when the subjet of composite keys is written about.

Thanks for replying to my earlier comment and trying to make sense of it.
- JJ
Staff advocate for processing bliss and time savings.
8/27/2007 5:47 PM | JJ from Eugene
Gravatar

# re: Composite Primary Keys

Ah! I see what you are saying now, got it.

I can answer to that concern in two ways:

To me, your user interface should not let you alter the customer/product relation once established, other than editing attributes of that relation. If you want to edit the OrderLimit, fine. But if you want to *change* the Customer/Product assignment, to me that is not something that you would allow -- you could remove that assignment, and then add another one, but not change. This is debatable, I understand that, but to me it is two completely separate operations. It is like creating a customer and assiging a customerID (PK) to that customer, versus editing a customer's name -- the same concept.

However, if it makes sense for a particular application to allow for changing not only attributes but also the relation itself, then you can simply use a DateStamp column or something like that to handle the situation you described. I would still recommend against allowing this via your UI, though, because it just doesn't seem to logically make sense to me.

Also -- keep in mind that while you can query ahead of time to try to predict if an update will succeed or not, you could instead just catch the error if it fails and report the result back to your application which can then display a friendly message via the UI. So, if it fails because a field is null or if it violates a PK constraint, you simply use the error number returned to display the appropriate message.
8/28/2007 9:52 AM | Jeff
Gravatar

# re: Composite Primary Keys

Jeff,

I'm not sure if this is getting off topic or not, but I thought I would respond to your comments.

I don't see how a DateStamp column would be superior to an identity column. Couldn't a date/time value technically be duplicated if two users happened to submit a record at the same time??? (Maybe not if the DateStamp column had an index? But then wouldn't you run the risk of an honest insert failing because two users submitted at the same time?) Unlikely, but possible? With an uniquely indexed identity column, I have a very simple integer column that is guaranteed never to duplicate. By simple, I mean that an identity column is both a simple data type and simple to work with. I don't see why a DateStamp or date/time column would be more attractive from a technical standpoint. Maybe just personal preference?

Your other idea is to let the error happen, capture the error and handle it as desired. I can see the attractiveness of this idea. Users are not going to make errors all the time. So, worrying about it up-front (querying the database as a check) every time the user makes a change could be seen as wasteful. However, most of the time, my applications use 'bound forms' for letting the user interact with screens of data. I do not believe there is a way to capture the database error up-front using a bound form.

The easy reply would be that the problem is my whole user-interface and using bound forms. As I've read before: bad, bad bound forms! However, I get lots of benefits using bound forms. If not being able to catch these kinds of errors is seen as a draw-back, it is sure a very minor draw-back compared to all the benefits. Checking for duplicates up-front has really worked well and so far has not shown any draw backs in terms of database performance, speed of my development, maintenance issues, or user unhappiness. (Then again, I do not develop for thousands of users a second, etc....)

As for simply not allowing the user to change values related to a primary key, I get your point on an intellectual level. However, when I think of applying it to specific situations in my own applications, the idea doesn't appeal. But I think that is really getting off-topic, so I won't go into all my thoughts.

Thanks for the discussion! I much appreciate getting another opinion on how I have been using surrogate keys and helping users understand the data. I also appreciate you taking the time to understand my earlier technical point.

- JJ
Staff advocate for processing bliss and time savings
8/28/2007 11:47 AM | JJ from Eugene
Gravatar

# re: Composite Primary Keys

JJ - I agree about the datestamp, you are right. But I think that even if you want your UI to allow for updating both the attributes *and* the relation itself at the same time, if you do things optimally and efficiently it is *still* the exact same process, with or without an identity PK column:

Step 1: if customerID or ProductID changes from previous values, make sure that no row already exists with the new CustomerID/ProductID values.

Step 2: update of row

In other words, for optimal performance and clarity, in either scenario you should only check PK or unique constraint keys when the PK or unique constraint values change; otherwise, it is simply not necessary to perform that check.

Of course in any scenario you still should check for and catch a duplicate PK or unique constraint error if it occurs when actually making that change.
8/30/2007 2:36 PM | Jeff
Gravatar

# re: Composite Primary Keys

>>"Maybe the order limit has a chronological component, or a change is made to have different versions of a product underneath a single product id"

>Why? Why would this happen? Why would you have a product in a table and then completely change it. "So product id 10 is currently a guitar but I think I'm going to make it a rhinoceros now." I don't understand the logic in that.

I think the first example is more compelling, as the 2nd can be avoided by good design. Say the order limit now have a chronological component, where you are allowed to order 200 in January, 250 in February, 250. Without an autogenerated PK, now the PK on your table has to change, as well as the FKs in any table referencing it. This doesn't strike me as good design.
8/31/2007 1:27 AM | Mike H
Gravatar

# re: Composite Primary Keys

>>Say the order limit now have a chronological component, where you are allowed to order 200 in January, 250 in February, 250. Without an autogenerated PK, now the PK on your table has to change, as well as the FKs in any table referencing it. This doesn't strike me as good design.

Again, this is the weakest and move overused argument about identity PK's. "If your logical model changes, your PK doesn't need to change!"

If your logical model changes, then everything had *better* change! If you have code and SQL statemetns and relations that expect that table CustomerProducts will always have 1 row per CustomerID/PRoductID, and then suddenly you decide to allow that table to have multiple rows per combination with an effective date or something like that, then *everything* changes! Your reports, your data entry forms, your SQL statements, your foreign key constraints, your lookups, your validations. EVERYTHING. If you think that you can just add an "effectiveDate" column to this table because you have an identity and then you are "good to go" then you are completely wrong, and thinking like this leads to some of the worst data models out there that have grown organically and don't even come close to accurately or efficiently modeling what they are supposed to.

8/31/2007 9:18 AM | Jeff
Gravatar

# re: Composite Primary Keys

Regarding the topic, but specially the last comment, I believe everything changes exactly because it was modeled that way. Of course with an identity column, the model, the application has to be validated, and possibly changed (including migration) to acommodate the change. But the change doesn't ripple throughout the model.
I honestly can't see the big difference between entity tables and relation tables. I could argue (and many DB purists would) that the PK in an entity table should be a natural key (probably composite). And it appears you can relate to the problems it introduces:
. very difficult to change (I can only imagine what must have been the migration of a DB where ISBN was used as a PK, or using the bank filial as part of the PKs of bank cards - imagine changing the account);
. no significant advantadge over the more flexible surrogate model.

To me the same argument holds with relation tables. It eases future changes. The disadvantages are extra complexity, but the end result is more flexible.
I haven't seen a solid example of the supposed advantages (in practice) of the composite key approach. The extra space isn't so much as stated if considering that other indexes include the PK. So the wider is the PK, bigger becomes the index.
I haven't seen any examples or papers that demonstrate in practice that it's more performant to use composite keys (in general).
Regarding the purity or correctness of the model, I can only say, that it's a model... It can be abused if one tries to be to flexible, but sometimes it's better to be flexible and pragmatic than to be rigid and theoretic.
9/10/2007 9:31 PM | André Cardoso
Gravatar

# re: Composite Primary Keys

Just some articles that I think illustrate better my opinion:
http://www.4guysfromrolla.com/webtech/110806-1.shtml
http://searchsqlserver.techtarget.com/general/0,295582,sid87_gci1156604,00.html
http://r937.com/20020620.html
http://www.bcarter.com/intsurr1.htm
http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

And an article that defends natural keys (looking through the comments, it appears many people don't agree):
http://www.sqlteam.com/article/identity-and-primary-keys
9/12/2007 9:03 AM | André Cardoso
Gravatar

# re: Composite Primary Keys

Andre -- I appreciate your comments .... however ... Every point you make is already addressed in my article and/or in my comments.

Until any of this is refuted, we've clearly established that, for relation tables:

* using a natural PK uses less space, less resources, since if you use an identity, you MUST also introduce a unique constraint on your true key columns
* using a natural PK clarifies your data model and makes it easier to read, understand and work with
* using a natural PK simplifies your SQL statements
* using a natural PK enforces data integrity without the need for an additional unique constraint
* using a natural PK results in true normalization, since your attributes are fully dependent on your primary key and not just a meaningless, random number
* using an identity PK actually complicates things like editing a relation, if you are relying only on the PK, since knowing that you are editing or viewing "CustomerProductID 3922" means nothing, while knowing that you are editing or vewing the relation between "CustomerID 23 and ProductID 4" tells you everything, simplifying your code and your logic.

I'd write more, but again, I feel as though I am just repeating the same points over and over and no one is specifically addressing them ....

Let's talk actuals, here, not theory ... can you please give me a simple example where using an identity as a PK for a relation table, instead of a composite PK, makes future changes "easier"?
9/12/2007 10:04 AM | Jeff
Gravatar

# re: Composite Primary Keys

I just can't run away from a good discussion. :)

Suppose that the client comes in and wants to introduce some “minor” changes/improvements.
First, he wants to keep the history of changes in the CustomerProducts table. So you create a CustomerProductsHistory table with the timestamp of the operation.
But he also reports that querys to find associations within an OrderLimit range are running very slowly. So you add an index to the OrderLimit.

These “changes” compound and the table becomes larger and wider with more indexes. Child tables are added and because we shouldn't use a surrogate as PK, we keep the parent PK and add some other attribute. Using Ids as the PKs from Customer and Products, the effect is lessened (the size of the PK is just double of a surrogate key, but with natural keys the effect is clear), but eventually your indexes start to suffer because the PK is larger, as well as all child tables PKs and indexes.

Future changes are easier because you get more flexibility in maintenance and with the model.

* In maintenance: if a mistake was detected and you MUST correct an association (the CustomerId or ProductId), you don’t have to mess with the PK. Because you didn’t expose the PK of the table to the customer (the surrogate key), you can change the data easily.
* In evolutions: if there’s a need to change the CustomerProducts table (you want to turn it into a temporal table, adding a start date and end date), it’s much easier if there’s a surrogate key, especially if the table has child tables.

I think most pragmatic people, having been through similar situations, can see the benefits and flexibility of the surrogate key model. Besides that, there’s also the other advantages mentioned like:
* Consistency in the model (easing mapping tools and development),
* Comprehension and usage of the model (this table PK is what 4 columns? And joins this other table with what 3 columns?) because the PK is just TableNameId and joins with the FK named TableNameId

The last point I don’t understand. You say CustomerProductID 3922 means nothing. Yet CustomerId 23 has meaning? Why? Only if you look it up in the Customers table… I just hope you don’t show CustomerId 23 in your UI, because then you just exposed the PK, and you can’t ever change it easily.

I’ll concede to the argument that it’s not completely 3NF. But if you go this way, why not make all the tables 5NF mandatory?

Just my 0.02 €

9/13/2007 2:50 PM | André Cardoso
Gravatar

# re: Composite Primary Keys

Andre -- let me ask you this, because you still haven't addressed this:

When you create your tables, and you are adding your identity primary keys, do you also create additional unique constraints to all of your tables to ensure you have data integrity?
9/13/2007 3:12 PM | Jeff
Gravatar

# re: Composite Primary Keys

Yes, of course.

But maybe I wasn't explicit in my last comment (or there's some ambiguity in my terms). To me TableNameId is the PK of the table. It's "private" and not exposed to the business. If the natural key (defined by the business) is just a number, I normally use TableNameNumber (eg: CustomerNumber, ProductNumber). This column is visible to the business, and follows the business rules (no gaps, prefixes, suffixes, etc).

This way the model is very consistent, predictable, flexible and usable. It may have some overhead (not much), and not be completely normalized, but to me it's a good compromise.
9/13/2007 7:09 PM | André Cardoso
Gravatar

# re: Composite Primary Keys

I tend to side with Andre on this topic. One thing that I will add when using an identity field as the PK is the reduction in columns in child tables. Especially when you have tables that are 5 or 6 levels down, and the parent tables have composite keys, and the parents parents tables have composite keys, etc. Eliminates a decent amount of columns, IMO is easier to read/understand the schema, and less SQL to write in the Joins.
9/17/2007 11:57 AM | Bryan
Gravatar

# re: Composite Primary Keys

Well, good discussions. It looks people look at the thing from different angles, such as from data modleing, from application development tool (Hibernate, EJB 3.0 or any O-R mapping tools) and maintenance standponit etc.

Even from just database design standpoint, I have 2 cents to share that the storage waste is not always true by adding auto id column as PK. Please see my table samples I was finghting with before.

There are the tables below (omit the FK referential, Unique, Not Null etc. constraint codes below to save space, but they are there, PK1,PK2,.. stand for composite key columns, PK stands for single column key, FK stands for Foreign key):

1. Subsystem(Subsystem_ID integer, --PK
Subsystem_Name varchar(50), --UNIQUE
configuration varchar(20)
);
2. Devcie_Name(Device_name_id integer, --PK
Device_name varchar(30), --UNIQUE
department varchar(30)
);
3. Device_Application(Device_name_ID integer, --PK1, FK
Application varchar(20), --PK2
Usage_Type Charchar(1)
);

4. Subsystem_Device(Subsystem_ID integer, --PK3, FK
Device_Name_ID integer, --PK1
Application varchar(20), --PK2
Optional_Device smallint
);
5. Subsystem_Device_Circuit(Subsystem_ID integer, --PK3, FK
Device_Name_ID integer, --PK1
Application varchar(20), --PK2
Circuit_ID integer, --PK4, FK
Ckt_suffix varchar(20) --PK5
);

I was fighting heavily with the kinds of design but no avail. Even though the table 3,4,5 are relating entities (The Circuit table is omitted here), but apparently the storage was wasted a lot compared to the single column ID key design and the join conditions are lot in each sql statement.

So I would not go straight all the time with auto ID key, but composite keys should be carefully considered regardless of from data modeling or application development or support standpoints.

John
10/3/2007 12:56 PM | john
Gravatar

# re: Composite Primary Keys

Keep in mind: Hibernate does not have any problems with composite keys, except
- You the developer become responsible for setting the components of the keys. No IDENTITY, AUTO_INCREMENT and the like for you. Sequence numbers etc. need to be extracted before inserting the data.
- The developer also becomes responsible for keeping track of whether an object is new or has been inserted, since unsaved-value is out of the question.
- The composite key should be implemented as a class with the usual equals() and hashCode() logic to ensure uniqueness in a session or collection.
- On the other hand: It could be argued that if you use composite keys, using Hibernate and frameworks like it becomes easier since you leave to Hibernate to write the complex SQL joins for you. :)

RDBMSes as a rule do not allow changing primary keys, so if they carry information that must be information that is GUARANTEED never to change. In the lifetime of the database.

Also: If you use composite keys just to get the unique/index side-effect, you are better off with explicit constraints and indexes you can disable/change later on as needed. Even when designing databases you should keep in mind the rule that an application design is never static, and you should strive for a design that is open for changes later on in the life cycle.
10/18/2007 6:20 AM | JadedGamer
Gravatar

# re: Composite Primary Keys

Imho, composite primary keys belong in the 1980s. Back then there was no technology for anything such as automatically generated IDs, so there was a need to make primary keys. Today there is no good reason to use composite keys are primary keys.



10/23/2007 11:58 PM | Jack
Gravatar

# re: Composite Primary Keys

No matter what advantages composite keys appear to have, I still think it's bad practice (unless absolutely necessary). You should never allow the inner-workings of a information system to become slaved to external influences.
10/25/2007 2:10 AM | jeremy
Gravatar

# composite Primary Key of defination

Composite Primary Keys, Tuples and Attributes OF DEFINATION.
11/9/2007 3:46 AM | Zar Zar San
Gravatar

# re: Composite Primary Keys

Hi,

Composite primary keys referring 2 columns is an excellent concept. Have u tried to map the same concept in hibernate? If so, and get succeeded in it, please revert back to me. I will be thankful to you.

Regards,
Brindha
12/13/2007 2:40 AM | Brindha
Gravatar

# re: Composite Primary Keys & foreign Keys

Hi,

I could find a solution to map a composite primary key and it's corresponding composite foreign key. I was also able to save data into the table holding the composite primary key. Now, I encounter another issue. I have to query the columns individually in the table holding composite foreign key. I have stuck here. May, any one of you helps me to resolve this issue. To explain more,here are my tables.

Composite Foreign Key Table is TS

TS_Id int - PK1(Primary Key)
Project_Task_Id
User_Id int - The above 2 fields comprises the composite foreign key.
some more fields too.......

Composite Primary Key Table is PTT

Project_Task_Id - int - FK1(Project_Tasks table)
User - int - FK2(Users table) - The above 2 fields comprises the composite primary key

I have added data into PTT. Now, before adding data into TS, I wish to check whether there exists any record for the given Project-Task_Id and User_Id. I couldn’t check this in hibernate as I have mapped this as composite foreign key object. I don’t know how to query the individual columns in this composite mapping.

i.e., I wish to query like this in Hibernate

select * from TS
where project_task_id=1
and user_id=2

In hibernate, I have mapped these two fields in TS.hbm.xml file as
<many-to-one name=”compositeptt” class=”com.ts.PTT.hbm.xml”
<column name=”project_task_id” />
<column name=”user_id” />
</ many-to-one>

Please, anyone of you who knows how to do, help me.

Thanks in advance

Cheers,
Brindha




1/23/2008 7:04 AM | Brindha
Gravatar

# re: Composite Primary Keys

not using pk + unique index will come back to bite you in the ass 100% of the time. However...windows folk do some strange shit to make thier life complicated...

Your design will break whenever item_id switches from numeric to alphanumeric which is a classic example now why would it? WHO KNOWS management is like windows dbas and programmers they do strange stupid crap for different reasons undoublty learned from cutting teeth on access tables. I've been on projects where they wanted item_ids to represent something, that something inevitably changes over time. for example 1211110
1=vendor,2=month,11110=item code

why would they do this?????? BECAUSE they might have to shelf products and to do it quickly they use some 'meaningful code'

now they want 1F11110 because $5/hr labor can't translate numbers to months

because now management is going to say "we want codes to mean something" so you have 2 choices:
1) add items.code in which your entire app will likely need a rewrite because its based around presenting items.id and maybe items.name in all the fields, edits,etc

if you had coded a pk as you should have originally done, your app would need nothing except maybe change items.id,item_id from int to varchar because the entire time it was using the pk which references the row which is what its supposed to do

your way is left over from the 80s before oop, autoinc, unique indexes,etc

without a pk how would you code a select drop down form option? option value=array?????? NO you'd use option value='1' with 1 being the PK , the row record




3/11/2008 3:16 PM | shawn
Gravatar

# re: Composite Primary Keys

well there is a 3rd option, you could change your items.id to varchar, but you would need to do that for all forgeign key tables also
3/11/2008 3:19 PM | shawn
Gravatar

# re: Composite Primary Keys

Shawn -- do me a quick favor: in a sentence or two, using proper English and grammar as best you can, would you mind recapping what the article that you are responding to is about? What database design advice is the article recommending?
3/11/2008 3:31 PM | Jeff
Gravatar

# re: Composite Primary Keys

The simple answer to the question, "How would you define what a primary key of a table is?" would be take the canonical cover of your relationship and determine the functional dependencies. That was part of the reason of having a Database Design I course in college.
3/11/2008 4:57 PM | James
Gravatar

# re: Composite Primary Keys

tnx for the information!. don't mind them, they're just insecure. hahaha
4/8/2008 5:57 AM | monica
Gravatar

# re: Composite Primary Keys

Wow! Am I missing something? All the article needed to say was its generally a good idea to use composite keys when the table relates entities and a unique generated key when it defines entities.

I agree that using composite keys requires more typing but your shorter SQL statements have no readability! Good design is about knowing where to stop before you abstract your database and application into generic goop for the sake of not needing to make any "complicated" changes.

I might be wrong here but I think the reason you are finding so many databases with indentity columns on all the tables is to easily support SQL Server replication.
4/30/2008 9:59 AM | Alan
Gravatar

# re: Composite Primary Keys

Hi, I am having primary key field for three tables, I used those three fields as composite primary key in fourth table. And
while mapping in hibernate, say FOURTH_Table.hbm.xml file, would any one suggest me that how to map these three composite primary keys?
6/2/2008 3:23 AM | Sam
Gravatar

# re: Composite Primary Keys

A good article. But I feel pretty outdated and taken from a book on (R)DBMS concepts .. In my past 12 yrs of programming i have learnt that concepts are good to read, but really hard to implement on a live application. Take the example of all those recommendation of 3 tiered architecture wchich used to recommend a n-tiered architecture that has a data layer .. this data layer is supposed to be able to connect to any database just by changing the connectionstring (microsoft world) .. all this is good to read, but not practical .. i have designed and developed application with composite keys and have slowly moved to designs with identity columns .. what you say is correct, identity column are NOT primary keys, but that does not justify not to use them .. Refering to your example, consider the table CustomerProducts has to have a relation to another table - for eg, a table which contains records on how much a customer actually ordered... you could now have a table called ItemOrdered with
customer id, product id, orderedQty, OrderedDate as fields .. right?
so where are the rules that define that the orderedqty is within the qty defined in CustomerProducts? I do not know whether you will argue that this needs to be build on the UI .. but if we get to that, we will discuss it later .. So to build that rule, you will need to have a relation between CustomerProducts and ItemOrdered .. so in this case what will be the foreign keys?

Lot of people above have been talking about changes .. changes are inevitable and people who have really worked on developing application that are actually to be used by users know how changes creep in and how illogical some of the requests are .. so there is no point in discussing about changes and how they will affect .. changes are impossible to predict .. just design the best you can hoping that your deisgn can support the changes .. learn from your mistakes and do not repeat that in the next design ..

anyway, again refering to your example, lets talk about a change .. lets say that CustomerProducts needs to have couple of attributes called EffectiveDate and ExpiryDate .. ie when will this quantity be effective from and when they will expire .. So now you unique constraint can no longer be CustomerId and productId because they can now be repeated for different dates .. So now you have to add to your compisite keys ... right? Now you may ask why this requirement came up? This is the kind of changes that happen in real live aplication and which users tend to miss during the requirement gathering phase .. Now with this change and if I had the ItemOrdered table, will I not have to change the keys on both these tables? There are many such practical changes that I can think of that can affect the database design .. There is no point in been rigid and holding tight onto to concepts that you read and learn during your college days .. you have to flow with the changes and learn to adpat to succeed .. if such changes happen, you cannot argue to the client and ask for more money to make those changes because your design will not adapt very easily to these changes .. they might agree the first time, but if you keep asking for money for all the changes, again experience will tell you that you will soon loose the client ...

Do note, I am not against composite keys or writign lengthier SQL statement .. i have been there and learnt from my mistakes and learned to adapt to new database requirements and capabilities ..

Just my 2 cents worth ..
6/3/2008 2:48 PM | RJ
Gravatar

# re: Composite Primary Keys

RJ -- thanks for your comment, but I already responded to the "identities and surrogate keys means that it is easier to make changes down the road" argument in my comment on 8/24/2007 9:20 AM.
6/3/2008 6:40 PM | Jeff
Gravatar

# re: Composite Primary Keys

I love a good argument. Use composite primary keys as their more logical IMO.
11/5/2008 5:46 AM | Andy Wyatt
Gravatar

# re: Composite Primary Keys

Surrogates are definately my favourite form of primary key. I would actually argue your definition of a primary key but I see the point that you are trying to get across. I loath composite keys being used loosely in a database because I don't like writting sql that involves remembering to join on multiple columns (who does). In the case of tables that form the intermediary between tables that have many to many relationships, I agree on the composite but this point is really 101 and the way it should be done.
11/5/2008 4:34 PM | Fabian
Gravatar

# re: Composite Primary Keys

When the number of tables is small, it doesn't make a huge difference as in the example. Lets assume there is a model that is 10 tables deep, with the 9th table join to the 10th table using a 9 key composite key. When writing queries with these 10 tables, the queries would need 45 lines to express the join. While with a dedicated table key, it involves 9 lines. I think the 9 lines approach would be more suitable for developers.

Let's suppose we've developed 100 queries with these 10 tables and wrote 4,500 lines of join. And we made a stupid mistake in the 4th to 5th table join. In that case we need to go through 4,500 lines of code to correct all the joins from the 4th tables onwards as they are all involved. You can simplify the correction by using find and replace if the joins are written in the exact order following the table but, it has no guarantee. With dedicated table key, we might be lucky to escape from the join change or at worst we need only to correct 100 joins and easier with find and replace.

I think the composite key approach is more pure but it put more pressure on the developers.
12/1/2008 10:44 PM | Maksiu
Gravatar

# re: Composite Primary Keys

I think the problem here is forgetting that a database is used for storing data to support an application and not as an application itself. I don't see any scenario where the end user executing a query directly to see the data. Only in this scenario will a "meaningful composite key" be required so that the end user sees directly the "meaningful" customer id and product id (as JJ and Andre already pointed out how more meaningful is a "customerid" to know the customer). The database should be designed to store all required data (none missing) and represent their relations as it is.

The need for knowing which column relates to what is basically upto the developer and the developer is more concerned about simplicity than meaningful column names. If a row can be identified by a single column that is much help for the developer than combining many meaningful columns. Here the idea of a surrogate key is laughed at because that key doesnt represent anything (from the article - "We also do not need an additional identity column that really serves no purpose"), wait - the identity column represents the row itself and there is nothing else that can so efficiently represent the row because every other column should be "updatable" (in a developer's terms "editable").

In the comments an advocate of the composite key said to Ramon Leon and schallstrom this --- "But back to the point, you guys shouldn't be allowed near computers let alone databases :( Go take some basic courses on computers, their architecture and how things work." -- I guess this guy thinks having learned some DBA stuff he knows everything about computer architecture. May be he dont know that when joining two tables comparing to integers will be faster than every other comparison. Shame on that guy (Muppets In IT O.U.T).

The surrogate key solves many unwanted problems for the developers and basically it is the application's user interface and not the database that the end user is going to see, it is the best to design the database for making it easy for developing applications. With good documentation the database structure can be made easy to understand for the developer, and it is the developer who has to work with the database and it is better that everyone keep it in mind and not make the developer's life miserable by providing meaningful columns which no one else cares.
3/5/2009 5:03 AM | Prajeesh
Gravatar

# re: Composite Primary Keys

Very good reading, thank you all of you, I will keep my eyes on this thread as I find it very informative.

4/23/2009 12:02 PM | justin mead
Gravatar

# re: Composite Primary Keys

I have a relational DB I am in the midst of designing that is a composite look of many different DB's.

Each DB contains the same structure but contains a different dataset within the structure. Each DB, however, may contain parts of the same data with different identifiers for that set of data as well. The application I wrote takes the data given by each DB, finds the similarities in the data, and puts it all together to present to the user. It was a design requirement for me to have the user not know the data comes from more than one place and that each DB does not know of the other DB as well.

To give an example, because each DB uses a unique ID for each row, object A might have ID#5 in DB#1 but might be ID#13 in DB#2. I call it in my application objectA (complex data-type stored in dictionary as key with a IComparer used to make sure data-integrity is kept) and store it as such in memory where the value of the objA are the attributes and their location. Since the initial design of the back end DB was done years ago by someone that wanted to describe the core of objectA as a GUID-like field, I am stuck relating back to that DB using their ridiculous ID fields that correlate to nothing unless you study the DB.

Moreover, my entire DB that I'm designing is relational in nature as it not only describes what something is, but where it came from. While some data presented to me from one DB will be unique, it is not unique across the board when multiple DB's are in play. The unique instance of data is actually the Location + ID.

I use the DB here only for storage purposes in the event I take a power hit and can't reconnect to the back-end DBs due to a network outage, and need to present the same data to the end-user who needs to be unaware of my network issues. That was another design requirement from my management.

For all who dislike composite keys, how would you go about describing data like this? When I tried to go back and redesign my DB based off the idea of not having composite keys, the DB grew from 10 tables, to about 14 tables, that's a 40% increase. My queries grew more complex as well because now I had more complex joins to get the data I needed to rebuild my application tables. I was having to do integrity checks that SQL would have done for me if I had left composite keys in place when storing my data when I get it from the back-end DBs. Also, my composite DB representing numerous DB's is smaller than the DB I communicate with due to my composite key design.

I'm no expert DB guy here, but how would an expert keep the complexity of table design and data integrity checks down while only using single-row identity keys? I don't see a way... please let me know if you do...

5/7/2009 4:04 PM | WhoKnows
Gravatar

# re: Composite Primary Keys

This is the best article I have read on this little-discussed but thorny subject. I incline to using compound and composite keys (see Wikipedia for the difference) because
a) when you are supporting a large number of applications and you need to understand them fast, it really helps if you have a database schematic diagram that reflects the logical primary keys and foreign keys relationships (rather than simply based on identities for each table)
b) in 'real' applications, lots of reports and data entry screens are short of space so using user friendly short codes (e.g. BA001 for British Airways rather than 0129457049) is often preferable (or at least acceptable) for experienced users than long numbers or descriptions.
c) inspecting and interpreting test results is often easier if text codes are used instead of numeric codes.
Having said that, when working on applications with very complex database schemas, I have found myself compromising on more than one occassion. I usually put it down to weaknesses in my normalisation skills...and maybe so..but there is a distinct lack of guidance out there on how to overcome some of the practical issues without resorting to substituting an identity for the true compound or composite primary key.
7/6/2009 5:03 AM | Ian McKay
Gravatar

# re: Composite Primary Keys

This was an excellent read. Excallent grammar as well. Not often I see that.
7/18/2009 12:47 PM | skhan
Gravatar

# re: Composite Primary Keys

what is composite key in database
7/20/2009 12:36 PM | yogesh modha
Gravatar

# re: Composite Primary Keys

yogesh modha-

A composite primary key is a primary key defined by more than one attribute. If you run this code on a SQL Server you will see a empty table called Silly_Teacher_Table. If you use a GUI like SQL Server Managment Studio, you can examine the properties of the table and see that the TeachID and CoursePrefix ID both have a key.

CREATE TABLE [dbo].[Silly_Teacher_Table](

[TeachID] [varchar](50) NOT NULL,

[CoursePrefix] [varchar](50) NOT NULL,

[LName] [varchar](255) NULL,

[FName] [varchar](255) NULL,

[MName] [varchar](255) NULL,

[Title] [varchar](255) NULL,

[Position] [varchar](20) NULL,

CONSTRAINT [PK_Silly_Teacher_Table] PRIMARY KEY CLUSTERED

(

[TeachID] ASC,

[CoursePrefix] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

Great Article by the way. I frequently use composite keys in my database design.
7/29/2009 5:41 PM | kd
Gravatar

# re: Composite Primary Keys

I'm sorry, but this is an absolute shocker posting this mis-guided article on the net where it could influence the naieve.

Mixing a database defined primary key with business data is just wrong for so many reasons - just so you can save some space in your database?

What sort of school-boy systems have you actually worked on? What would happen if eBay used someones email address as their primary key? Why don't you forward this article to them and see what they say about it. What about in a motor vehicle system, why don't we use the VIN or the registration plate to identify a vehicle?

What if you have an enterprise system with say 500 tables in the database with billions of rows split across these tables, all referencing each other and the natural key changes? How are you going to clean that mess up?

You also miss the point that if you have a billion rows in a table with primary keys that are based on character data, then you are wasting untold amounts of diskspace as opposed to a 64 bit integer, and instantly have blown your one point about having one less index out of the water. Duh!

8/1/2009 2:17 PM | ants
Gravatar

# re: Composite Primary Keys

ants -- you seem to be commenting on a different blog post than what I wrote. Where did I ever suggest that we use an email address as a PK, or even use business data in a PK at all?
8/1/2009 8:24 PM | Jeff S
Gravatar

# re: Composite Primary Keys

Great article and discussions. One thing is sure. I am aware of several new issues and appreciate it. Thanks!
8/14/2009 4:19 PM | mitch
Gravatar

# re: Composite Primary Keys

Good article Jeff, thanks. Looks like you struck a nerve or two out there. Who'd ever have imagined we'd see angry, opinionated comments on the internet??!
9/15/2009 2:22 PM | Paul
Gravatar

# re: Composite Primary Keys

i want the explaination examples of composite key
11/26/2009 4:55 AM | okike
Gravatar

# re: Composite Primary Keys

how can i used this Composite Primary Keys in anther relation ??
12/6/2009 5:48 PM | Ebrahim
Gravatar

# re: Composite Primary Keys

Jeff,

To make a very basic simplification from a fictional "much broader" financial app where there can be more than one customer per account, and a customer can have more than one account - lets say we have 4 tables: Customer, Account, CustomerAccount junction table and Transactions. If Transactions is by far the table with the most expected rows, isn't a surrogate PK in CustomerAccounts the most efficient inclusion in the Transactions table, rather than having to store both CustomerID and AccountID there for each transaction?
1/20/2010 10:37 PM | DM
Gravatar

# re: Composite Primary Keys

Wow. This was an awesome post and list of comments! There are many good points on both sides. I googled my way here trying to find out when a composite primay key is worth the hassle and when it isn't. I want to do the "right" thing and I always start out with a very concise design, but I have run into problems when mapping objects to the tables, so I'm totally on the fence! I wish there was a compromise that wouldn't also compromise the consistancy of the design standards. I'm still relatively new, and I am concerned about learning "best practices" that will help me be prepared as my projects get bigger.

What I'm taking from all of this is that this isn't a black & white issue. There is only a "right" answer for you personally after much trial and error. You should be prepared for people to disagree with you. If you are working with someone else on a project, be considerate of each other's needs, and try to settle on the best standards for the project that will get the job done and help y'all get along while doing it. Is that a fair thing to think?

Thanks everybody for your 2 cents.
2/18/2010 1:59 PM | Robb
Gravatar

# re: Composite Primary Keys

Ok, as I re-read my post, it seems pretty obvious so I feel a little silly posting it.

I'd really like to see some opinions on what you all do on an ORDER_ITEM table. It seems like the obvious choice for storage considerations would be a composite PK consisting of (ORDER_ID, ITEM_ID), where ITEM_ID starts at 1 again for each new order...

But what would you guys using the ORM development approach do in this case? Would you make life easy and just use a single PK for the order items and just hope that the total amount of items ordered never excedes the numeric type you choose for the column? I'm not trying to provoke more argument, I am just curious how serious the ORM guys are about the single identifier column for my own education. Because I'm looking into being an ORM guy. But I wish I could design tables that would make the DB guys think I was an ok guy too :)
2/18/2010 3:08 PM | Robb
Gravatar

# re: Composite Primary Keys

It's fairly easy to tell by each post whether or not the author of the post has had any training in data modeling. Having been both a developer and a data modeler, I understand the different perspectives. Most developers have not had data modeling training or did not grasp the concepts and fail to see the value of designing a database as a physical implementation of a logical model.

For the most part, I agree with the philosophy of implementing surrogate keys for core entities and composite keys for relationships. In those instances in which relationships take on a life of their own, with not only attributes, but foreign key relationships to other tables, it is likely I would use a surrogate key as well as a unique constraint on the 'true' composite key.

As others have stated, when the logical model changes - the meaning of the entity or relationship changes - there have to be changes to the database design, which will sometimes have a ripple effect.

I've seen way too many systems where database design was bastardized by faulty applications development that did not take into account the requirements of a good data model and led to all kinds of data anomalies because the proper constraints were left out of the physical database and left up to the application developers to enforce (i.e., enforcement never happened!).

My advice to application developers is to gain some understanding of data modeling principles. Not only will they help you design better databases (if that falls under your responsibility), but it will also help you design better applications if you properly understand the cardinality of relationships within the data model. Poor understanding of cardinality leads to some very poor and inflexible application development.
3/2/2010 4:51 PM | David
Gravatar

# re: Composite Primary Keys

I think it is important to ask the question behind the question, which is: how did the data anomaly happen in the first place? The blog post started off with an assumption that there can be abnormal data in the database in the first place, where the order limit for a customer and a product can be either 25 or 30. That begs the question: how did that happen? Obviously, the application that wrote the data to the table did not have robust enough logic to handle the situation. So the question ultimately boils down to whether you want the logic in the database, the application, or both. I don't have a ton of experience with Ruby on Rails, but it seems that Rails tries to take over all of the logic, including the SQL that is written. This seems to be much too extreme on one end of the spectrum. On the other hand, no amount of solid database design, constraints, etc. is going to make up for a developer that doesn't know what they're doing.
4/1/2010 7:17 PM | Joe Ehat
Gravatar

# re: Composite Primary Keys

It is amazing how many people think they "know" databases, just because they know how to play with Access or PhpMyAdmin... without having a clue of what they are actually doing. It's like people with a pirated copy of Photoshop and call themselves graphic designers.

Your articles just shows how many people don't actually know a thing about databases; otherwise you wouldn't need to write this. I recently had an argument with a colleague that didn't even know (had never heard of) things like Natural Keys, Alternate Keys, Surrogate Keys, etc.
4/19/2010 7:47 AM | JJ
Gravatar

# re: Composite Primary Keys

Having been an enterprise database designer for close to 15 years now, I would be shocked if a developer in a large corporation actually asked me to change a composite key such as the Customer to Product association to an Identity column. The major benefit of a good logical database design is it's understandability to business users, developers, business analysts and the like. And yes, I do review all of my logical database designs with all of the above to where they have a fairly good idea of what the entities represent and why I have the relationships that I have in the model. If meaningfulness is King, then data integrity is certainly Prime Minister. An Identity column for Associative tables allows for a data integrity violation, which is the first thing you want to avoid in designing a database and most especially if it's an analytic database. Reports need to be correct and consistent. It's the "tightness" of the model referred to earlier which distinguishes good from bad design.

All of the arguments for and against composite primary keys have been stated at least once in this thread already. I don't need to restate them here. The comment that composite primary keys makes life difficult for developers just doesn't hold water. I would chalk that up to laziness in coding more than anything else. Yes, it's challenging to work with composite keys, but really how challenging is it. It's difficult to create a good data model too. So what...it's part of the job, man! The example of the 10 cascading tables is very extreme. Even in data models that I've seen with 27,000+ tables, I haven't seen cascading to that level. In that case, Identity columns as primary keys are far worse for developers since with Identity columns you have to join all the way back up to the parent ancestor in order to get to the natural key to display. Imagine joining the 10th child down to the 1st parent up just to find out; for example, the true Customer Identification Number for a person or organization. Instead by using the natural keys and composites, the parent is already present in the lowest level child so that multiple joins back up the chain are unnecessary. Now this is an extreme example, since you really don't want a 10 column composite key. If you've got that, the database designer should rethink the model design anyway so this example is just for illustrative purposes.

The argument for identity columns in every table is akin to the one big table philosophy; that is, just put everything in one big table and that will take care of everything! Hey, that makes it the easiest for developers, right. Absolutely not and don't even think about it. Having an identity column for every table is just plain ridiculous. No good database designer would do that, ever! In every large company I've worked for, they'd likely be fired for even trying to do so. Hey, developers, our job is not intended to make life difficult for you...it's to do the right thing for the organization. Well designed databases are flexible, scalable, have great data integrity and perform well. Ultimately, a good database and application serves the business customer in trying to make their job easier. That's what all IT applications and databases should strive for - not to make life easy for the technology people, but to make life better for the business. In the long run, the ultimate customer in the business process is really what's most important. It doesn't matter how hard our job is to make that happen. It's not impossible to work with composite keys, so go out there and do it. If nothing else, consider it a challenge and just another skill added to your resume.
4/22/2010 3:27 AM | BeHappy
Gravatar

# re: Composite Primary Keys

It seems the main difference between those that 'get it' and those that don't are that the ones who 'get it' are DBAs or DB devs. The ones who don't are the application programmers. That actually makes a lot of sense, because professionals who live, eat, and breathe databases obviously are going to know a lot more about best practice than the app devs. Same can be said in reverse. I don't tell our app devs how to properly code C#. hah!

In my experience, composite keys have a greater impact on an analysis DB (read) rather than a transactional (write) one. My main gains in performance switching from identity PKs to composite PKs have been on warehouses hooked up to analytical tools. The performance gain on a 50m row table where the table is written in order of the natural identifier (Clustered composite PK) is awesome (Or on a fact table, time and subject [PK on the granularity]).
5/24/2010 1:52 PM | jon
Gravatar

# re: Composite Primary Keys

Jeff, thank you for your initial article and starting this amazing forum for discussion. I never realised that so many people felt so passionately against a composite primary key. It's clear that there is a big difference between someone who understands the value of a well designed database and one who doesn't. A datamodel exists to represent a company's data in the truest possible form and the quality of a company's data contributes to its strength and success in a competitive market. A good datamodel requires a lot of planning and thought up front - a process which should include key people from the business and the development team. A data model should be so transparent that it serves as the most important piece of documentation for a system. If it manages to achieve these things then everybody's job becomes easier.
7/31/2010 8:34 AM | Sue
Gravatar

# re: Composite Primary Keys

I'm a fan of "natural" keys as the PK mainly because the natural key is what the business knows, so, when I want answer a question, if natural keys are involved, I usually don't have to join a pile of tables, but instead the data is on the table I'm going after.

But, Object Oriented is here. I'm willing to sacrifice some performance and add some indexes to my model if it allows the OO world to map to the database I'm designing. For that reason, I always use surrogate keys in my models, but I couple those surrogates with Alternate Keys on the natural key of the entity. The OO world is happy because they have one high-level class that defines a database "row" class, and that class has a single key, which then, all of the implementing classes also have. So, then, their OO world maps easily to the DB world, and both sides are happy.

Another reason I've come to like surrogates, especially for the history tables is because of the Kimball dimensional design. How I use these history surrogate keys are that each SCD is really an explosion of historic tables, so I first build a loading table of historic surrogate keys, then I insert into the dimensions by joining back those tables to the history tables and explode out the attributes. For that reason, I always fgive a historic record a surrogate key.
8/24/2010 4:50 PM | Galen Boyer
Gravatar

# re: Composite Primary Keys

I don't like who write "only DBA is right" or "you are not DBA so you are not able do design a god DB".

If I design a UI and the customer tel me "this UI is inusable" i don't tel him "I'm the programmer, i'm right, you are only a stupid customer!!". Rather I think i did something wrong in the UI design.

The same il with the DB design.
The DBA produce the DB but the programmer have to interface and work a lot with it.
So the DBA can't stop up his ears!

The DBA take a second to decide to use 5 field for the PK... for the programmer this mean some hour and some headache every day for month.

What about changes... for the DBA adding another field in the PK is question of a second.
for the programmer thie wil translate in some wek of work to recode all the sql queryes.
9/9/2010 6:09 AM | Marco Staffoli
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET