I want some Moore

Blog about stuff and things and stuff. Mostly about SQL server and .Net
posts - 218, comments - 2281, trackbacks - 33

My Links

Advertisement

News

Hi! My name is 
Mladen Prajdić  I'm from Slovenia and I'm currently working as a .Net (C#) and SQL Server developer.

I also speak at local user group meetings and conferences like SQLBits and NT Conference
Welcome to my blog.
SQL Server MVP

My Books

SQL Server MVP Deep Dives 2
The Red Gate Guide to SQL Server Team based Development Free e-book

My Blog Feed via Email
Follow MladenPrajdic on Twitter


Users Online: who's online

Article Categories

Archives

Post Categories

Cool software

Other Blogs

Other stuff

SQL stuff

Why I prefer surrogate keys instead of natural keys in database design

Simply put:

I prefer using surrogate keys because natural keys are by default a subject to change which is a bad behavior for a row identifier.

But let’s dig a bit deeper into each key type to see why this is. Here’s a little table with column names that tell us what kind of a key each column is.

image

Surrogate keys

A surrogate key is a row identifier that has no connection to the data attributes in the row but simply makes the whole row unique. And that property is also the downside of it. Because it has no connection to the data attributes we can have two rows with the exact same data in all columns except the key column. This is usually handled at the application side and is an acceptable downside.

An example of a surrogate key is an integer identity or a GIUD unique identifier. I’ve never seen another data type being used as a surrogate key successfully. Both have their pros and cons though.

GUID unique identifier

GUID is globally unique 16 byte long data type that can have 2128 different values. This makes it ideal for scenarios with multiple server moving data from one to another like replication.

However for a key 16 bytes is really a lot. This causes less data to be available on a single data page which in turn causes extra IO activity because it has to retrieve more data pages. Another issue about it is that is causes perfect page splits in a clustered index because it has random 100% selectivity in it’s entire data type range.

Integer identity

Integer identity is either 4 byte INT with range from -2,147,483,648 to 2,147,483,647 or 8 byte BIGINT with range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. In 99.9% of cases this range is split in half because the default setting in SQL Server is to start any integer data type from 0. As this is a surrogate key this makes no sense and there’s no reason it shouldn’t start from the min value.

It is a small data type which gives it the advantage of having more data in the data pages thus needing less IO for the same amount of data. Unlike the GIUD unique identifier the integer identity has ever increasing 100% selectivity in it’s entire data type range. This makes it a perfect candidate for a clustered because it doesn’t cause page splits. If it actually is an appropriate candidate for a clustered index is a different matter.

Its downside is that it is not ideal for multi server scenarios although it can be done by using another tinyint column identifying a location and making it a covering row identifier over ID and LocationId columns.

And remember: Never tie any business logic to the surrogate key other than simple CRUD operations. 

Natural Keys

A natural key is a row identifier composed of data that uniquely describes data using its own attributes. An example of a natural key is social security number or other government issued number.

However this presents a huge problem from the physical database implementation point of view. In most databases a row identifier is usually also the basis for the clustered index and non-clustered indexes. But natural keys are by definition a subject to change. When the clustered index key is changed ALL indexes have to be rebuilt because non-clustered indexes contain the full key of the clustered index. So every time the natural key, which is also a clustered index changes, all indexes have to be rebuilt. And this is not including changing the actual data type or it’s size, jut the key value.

At this point someone might say: Yes Mladen you’re right about the theory of this but how many times have you seen the Natural key really change? Well so far I’ve seen it 2 times both with heavy consequences. It was 2 times too many.

Natural Key Fail Case 1:

It was a standard customer, product, order type of application. The key in this case was the 7 char long customer ID. It was a mix of first 3 letters of the customer name plus 4 numbers that also had some business meaning. The company got acquired by another company and a new customer numbering was introduced. Every key in that database had to be changed. Due to fully breaking changes to the database the whole application had to be modified and the store went offline for 3 months loosing the company a lot of profit. All this wouldn’t have happened if they had used surrogate keys.

Natural Key Fail Case 2:

This one was even more far reaching. In Slovenia (my home country) we have something called a Tax ID. This is an ID that is unique for companies and individuals so every person and every company has one for tax purposes. Many systems in Slovenia used it as the natural never changing key which sounded like a reasonable thing at the time. And it was so for over 30 years. Applications came and went. But in 2004 Slovenia entered into the European Union. So we had to modify the TaxId to European standards which means that every application using it had to be changed. I know of at least one company that went out of business because of this change. Again had they used a surrogate key the only change would be the length of the TaxId column.

Because of all this I’ve come to the prefer the surrogate keys in majority of cases.

Hopefully this gives you some insight why surrogates are in my opinion better suited as row identifiers. Although whichever you choose is still a matter of common sense and your business problem. The answer is always “It depends”.

 

kick it on DotNetKicks.com

Print | posted on Tuesday, October 06, 2009 1:17 PM | Filed Under [ SQL Server Back to Basics ]

Feedback

Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Great post.

One reason to use natural keys is when you've got large datasets like data warehouses. Sometimes their large size means you need to keep the tables as small as possible for performance purposes. Sometimes you need to do selects of ranges of data across the natural keys, and it can be more efficient to organize the table with a clustered index on those natural keys.

If you choose to use natural keys, don't believe the users who tell you that the natural keys will always be unique. They're lying. If I had a dollar for every time someone came back after the product went live and said, "We do have this one strange case where we need multiple customers with the same ID and multiple invoices with the same ID...."
10/6/2009 1:29 PM | Brent Ozar
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

I only use natural keys when it's something that is 100% guaranteed to be immutable. Thus far, I have never used a natural key. Plus, there are many reasons, outside of database design, not to use natural keys. Here in the US there are regulations around using Social Security Numbers/Tax IDs as keys. I know that people do it, but they shouldn't be doing so.

The recent change in ISBN numbers is another great example. Who ever thought that the 10 digit universal identifier for books would change? It's now a 13 digit identifier. How long before that changes again?

A case can be made for using CASCADE UPDATE foreign keys. This works. But it only works as long as you're able to keep all data on the same physical server. Beyond that point, you lose the ability to cascade those updates and you start having to build application logic to maintain the relationship. And, again, you're dealing with a lot of work caused by a natural key that should have been immutable - that's why we chose it in the first place!

While I despise the fact that we have to use surrogate keys in the database, they make the most sense. There are too many "immutable" things that can change in this world. Standards come and go, but I will always know that I'm #1 because I used IDENTITY INSERT before you did :P
10/6/2009 1:33 PM | Jeremiah Peschka
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

> tables as small as possible for performance purposes
as i understand this you mean you should use surroagets in that case too. so why put a CI on the natural key?
10/6/2009 1:33 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

> tables as small as possible for performance purposes

Let's say you have a data warehouse with sales data, and your company has item numbers that include the vendor number, like this:
- NI3957 - a shoe by Nike
- AD1573 - a shoe by Adidas
- PU3153 - a shoe by Puma
I'm simplifying this here, but you get the idea. They frequently run reports that ask:
- Give me all sales for the Adidas 1573
- Give me all sales for Adidas

With a natural key using that item number, you can use a single column, single index to accomplish that. It's a niche case - not the majority - but I've seen it work every now and then.
10/6/2009 1:41 PM | Brent Ozar
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Yeah, really great post.

On GUIDs, someone once said to me "Who cares if an ID is globally unique", and he was absolutely right. If the data member gets moved into another system its probably going to get assigned a different surrogate key anyway - so globally unique is irrelevant in that instance. A surrogate key only needs to be unique for that system.

I completely agree with what Brent said: "don't believe the users who tell you that the natural keys will always be unique"


On integer surogate keys.... you're right that there is no need to start from zero but I have to admit...I do it anyway, for no reason othr than convention (i.e. its what everyone else does).
And is it worth mentioning that one disadvantage of integer SKs is that they are finite? :)

good stuff Mladen.
10/6/2009 1:42 PM | Jamie Thomson
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@BrentO:
Ah i get it! Nice!

@Jamie:
well if you want to nitpick everything is finite :)))
seriously though, i don't consider that a downside at all.
as for GUIDs i find them excellent for distributed systems. especially if you have 1:N relationship on identites and you have to move both tables to another server. GUIDs shine in that case.
10/6/2009 1:55 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

MP: "I’ve never seen another data type being used as a surrogate key successfully. "

MS: You can use MAX+1 value used in instead of insert trigger. Beside you don't need to manage key on cliane side.
10/6/2009 1:57 PM | Miha Sedej
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Miha:
Max+1 can be problematic for highly concurrent scenarios. so i really don't like that solution.
10/6/2009 1:58 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

I use surrogate keys, normally integer identity starting at 1, probably should start at the minimum though. The only place I'm not using surrogates is on tables like countries where I use the ISO standard codes, but even those codes can and have changed.
10/6/2009 2:22 PM | Jack Corbett (@unclebiguns)
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Malden
Highly concurrent ??
Triger must run in transaction off cource.
10/6/2009 2:30 PM | Miha Sedej
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Miha:
unless your transaction isolation level is set to serializable you're bound to get duplicate values.
downside fo serializable is that it easily overlocks things in long transactions.
10/6/2009 2:37 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Mladen:

U dont need to have higher isolation level. REad commited is enough. As i said transaction is in Instead of insert trigger on the table you wont to have surrogate.

Do u need an example ?
10/6/2009 2:40 PM | Miha Sedej
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Miha:
Instead of triggers you say? but that would mean you have to have them on all tables. hmm.. don't think i like that... and yes do post an example for completeness sake.
10/6/2009 2:43 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Miha - using MAX+1 as an insert value can be troublesome when you're trying to insert more than one record at a time, like a big insert from your ETL tables to your regular tables.

Anytime you write insert/update/delete code against a table, try to think about how it will react when a user wants to insert/update/delete more than one record at a time. I blogged about that here:

http://www.brentozar.com/archive/2009/01/triggers-need-to-handle-multiple-records/
10/6/2009 2:55 PM | Brent Ozar
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Surrogates are a useful part of database design, but I have learned to stay far away from identity (what most people seem to infer as the choice for implementing a surrogate) when it comes to reference databases - like a parts database. It is better to have a generated key that is not system owned IMO (in absence of an actual natural key). You then can store the key in other databases. Since identities are system handled - storing them in other databases can be problematic - so you end up storing some alternate key, and then every lookup of a part requires an additional seek. Of course when to use what is the reason we get paid. These application/company controlled keys are really surrogates, but end up being the way to identify the item in the company. Its a surronatural key.
10/6/2009 3:34 PM | WIDBA
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Re GUID: "...Another issue about it is that is causes perfect page splits in a clustered index because it has random 100% selectivity in it’s entire data type range." What about NewSequentialID()?


You put forward a strong arguement but you have failed to highlight a *very* key piece of information:

If a natural key exists, ensure you apply a UNIQUE constraint to it! There's no harm in having a surrogate key as long as you still enforce the validity of the natural key.
10/6/2009 3:42 PM | George
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

I use surrogate keys when it's appropriate to do so, and natural keys when it's appropriate to do so. Sometimes I use IDENTITY columns. Sometimes I use GUIDs. It all depends. And quite often the correct answer is not clear.

Database design is all about balance and the worst thing you can do is become attached to one idea to the exclusion of all others.
10/6/2009 3:45 PM | Adam Machanic
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

>> As this is a surrogate key this makes no sense and there’s no reason it shouldn’t start from the min value.

In SQL Server 2008, using data compression, there's a good reason. Negative numbers will not be compressible as they fill all the bytes. This would be especially wasteful with bigints.

It's a minor point but worth considering for large data warehouse stores.
10/6/2009 3:46 PM | Rob Volk
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@George:
you know i don't really see a point of NewSequentialID()? if you need a sequential guid you might as well use an identity.

@Adam:
you're absolutely right. as i said at the end, the answer is always "it depends" and you have to weigh all the options. but i'm a surrogate guy most of the time :)

@Rob:
you know i haven't even thought about data compression. that a very very good point!
Thanx!
10/6/2009 3:53 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Horses for courses. For OLTP purposes, surrogates are often desirable. For DSS, natural keys are overlooked too often in my opinion. A complex query that returns vast quantities of data but elicits an execution plan that is packed full of range scans and merge joins but not a single table scan or hash join is a lovely sight and runs like the wind.

Also, you imply that using surrogates will result in duplicates (and these are acceptable??!?) - fair enough clustering on a surrogate, but surely you enforce uniqueness of the natural key too? The surrogate is just an added column for optimising the physical design - it does not remove the need for other unique constraints.
10/6/2009 3:56 PM | Dan Halliday
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@DanHalliday:
i haven't even touched on check and unique constraints. that'd be way to much for a simple blog post. I'm all for them!
10/6/2009 3:59 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

A couple of additional thoughts:

A) it is NOT true that natural keys are necessarily subject to change. Right now I'm building a datamart off of a financial back office system. It generates its own internal transaction identifiers. Due to the nature of finance data and auditing requirements, these identifiers are not allowed to change. I'm using them as part of the key for the table in the datamart that deals with transactions. That is, in effect, a natural key. Creating a surrogate in this case would introduce a layer of indirection with no benefit. It would only create more work for the system, as I would have to go through a mapping process when bringing in new data.

B) Natural Key Fail #1, the key needs to get changed because the acquiring company uses a different scheme for its customer IDs. Okay, how about Surrogate Key Fail #1: Both companies use integer surrogate keys starting at 1. Weak argument on either side. Had both companies used natural keys perhaps they could have simply merged them with the understanding that not all customer IDs would be uniform.
10/6/2009 3:59 PM | Adam Machanic
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Jeremiah: "it only works as long as you're able to keep all data on the same physical server"
A *valid* foreign key relationship can only be enforced within the same database, so I'm afraid I fail to see your point here?

@Jamie
In the past I have had to care about an ID being globally unique, on two occasions in fact! When the company I work for bought/merged with others, we had to combine our payroll and HR information in to a single system. Duplicate employee numbers were one of the biggest pains we had to overcome.

@Mladen
I think it might also be worth noting in your post that just assigning a field the identity property does not make it a key.
10/6/2009 4:01 PM | George
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

I generally don't model with surrogate keys in mind. I do my logical model based on natural keys period.

GUID's have a very narrow application and large implications for performance in the relational model. Since they are random by nature if you index any data with the GUID included in the index you effectively randomize your data. In SQL Server 2005 they did introduce a way to get a sequential GUID on a per table basis but it is a band-aid on a sucking chest wound.

IDENTITY (int, bigint) also have their problems. If your system is alphabetic doing data modification can be problematic if it has to be included in the index structure and it is sorted on that identity.

It all boils down to SQL Server indexing things in a balanced tree and when you use surrogate keys you are forcing your data to align with that key some where some way.

You must understand your data before applying a one solution fits all like always use surrogate keys, or any other data rule for that matter.

You need to understand that something like social security number is a surrogate key, It is just made up by the Government, and is considered a standard but doesn't 100% identify a person.
ISBN's are an industry standard but still a surrogate key for a book, which has immutable attributes that can be modeled.
When these "keys" were invented the relational model was ether not known or in its infancy at best.
We have carried these practices over to the relational model because it is ingrained in our daily lives.

People tend to use surrogate keys without understanding the implications of maintaining data integrity, performance or sustainability.

I usually only use surrogate keys when I'm forced to by previous designs that I've now inherited or I hit a limitation of the data engine I'm working in.
If you are going to use surrogate key structures set them as alternate internal candidate keys that aren't exposed to end users.
Put constraints on the natural key to keep data inconsistencies from creeping into your system like duplicate data inserts.

If you are that aversed to relational design and natural keys I suggest you move to a KV engine or something simmilar to hold you data since flexability is important and relational design isn't 100% required.
I'm not saying you are wrong, I would simply recommend using the right tool for the right job instead of hammering a relational engine into a non-relational application or need.

I am a SQL Server guy by trade but love dealing with all the different data engines out there right now KV and column stores are both interesting and solve different problems over the relational model.

-wes
10/6/2009 4:09 PM | Wes Brown
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Adam:
A) interesting. however laws change, and who knows... maybe a dictator comes to power and decides to turn things around. you might think this is far fetched but stuff like that happened not long ago (20-30 years) in this part of the world.
but again i can't really comment on your case without actually being involved, can i? :)

B) here the merger would be easier if both used surrogates even if they did start from 1. this way they wouldn't have to change the app but with naturals they had to. i'm always looking at these things from both app and db level of change.

@George
actually it makes it a candidate key. i've omitted talk about Primary keys and Foreign keys for a reason.
10/6/2009 4:10 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Wes Brown:
Prefer doens't mean "never use another option" :)
Key value stores are something i've not even touched yet. but they do show potential.
10/6/2009 4:14 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Mladen, what about the merger would be easier if they used surrogate integers starting with 1? From an app perspective it would be MUCH easier if both used strings. Let's say one company used a string of the form "CST0000000", and the other company used a random string of eight alphanumeric characters. From an app perspective, BOTH are simply strings, and from the database perspective there is 100% uniqueness across both of the sets of keys, so they can be merged without an issue. As a matter of fact, a healthcare company I did some work for a couple of years ago had almost exactly this scenario, and went through three mergers/acquisitions, retaining the original keys in each case. An interesting side benefit is that you can quickly look at any customer ID and know right away what its original source was. This was invaluable when we had to research various data anomalies.
10/6/2009 4:25 PM | Adam Machanic
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Wes
Quite agree re the logical model. We often find (on DBForums) that people think primary keys and IDs are synonymous and so shove surrogates in to every table *at the logical level*. Second normal form is a bitch if you use surrogates in your logical model. So yes - logically design with natural keys always; surrogate keys are an implementation decision.

BTW - NI numbers, ISBNs etc are not surrogate keys - they are semi intelligent, natural keys.
10/6/2009 4:25 PM | Dan Halliday
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

As an aside, if in my data warehouse I have a table called MyTable and I am using SKs then the table *will* have the following two fields:

MyTableId (surrogate key field)
MyTableNId (natural key field)

This can be useful for code generation because your code-generating code always knows which columns contain the surrogate/natural key - so generating JOIN predicates is a lot easier.

Its more difficult on the odd occasion when your natural key is composite - but not impossible to handle.

Consistency rules!
-Jamie
10/6/2009 4:35 PM | Jamie Thomson
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Adam:
yes that would be much easier and i completely agree with you in your case.
in my case the thing was that the product numbering itself changed.
because the actual natural key changed much had to be reworked (app wasn't state of art either :)). if they had used anykind of surrogate they wouldn't have to change any kind of app logic. just update the product number everywhere and rebuild NCIs
10/6/2009 4:40 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Surrogate keys seem to be the method of choice for star schema scenarios (don't hit me). There are the reasons cited, plus let's say you want to capture history. Product XYZ1000 might have changed over time. Now you can tell why. You still include the business key for reporting reasons, of course.

As to the concurrency problems and so forth, typically you'd want to setup a surrogate key pipeline which takes these potential problems into consideration. You can search around and find some examples.

Of course all of this is from more of a reporting that OLTP perspective. I still think the surrogate keys are essential due to the cited potential change of business key problem. Nice write-up, I put your blog in my list of RSS feeds.

SDC
10/6/2009 4:58 PM | SDC
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Mladen, Your title seems to imply that you are making an either/or choice between two types of key. It's important to remember that a table can and should have as many candidate keys as are required to make it effective - a point which doesn't get emphasised often enough in my opinion. I expect you didn't mean to imply that the two options are mutually exclusive but unfortunately many people may jump to that conclusion.
10/6/2009 6:30 PM | David
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

where is JOE ?
:-)
10/6/2009 7:04 PM | noeldr
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Mladen and ALL about MAX +1

Here we go with example: (create table and instead of insert trigger which generates MAX plus 1 surrogate key )
The beauty is that you can insert one row or large datase and works just fine. Try to play with data. (insert dataset, delete some rows and again insert some data.



CREATE TABLE [dbo].[sometable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[surogate_ID] [int] NULL,
[Name] [nvarchar](50) NULL,
CONSTRAINT [PK_sometable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TRIGGER dbo.sometable_instead_ins
ON dbo.sometable
INSTEAD OF INSERT
AS
SET LANGUAGE Slovenian
SET NOCOUNT ON
SET XACT_ABORT ON

BEGIN TRANSACTION tran_sometable_instead_ins
DECLARE @last int, @id int
select @last = ISNULL(MAX(surogate_ID), 0) from dbo.sometable
select * into #inserted from inserted
update #inserted SET @last = @last + 1, surogate_ID = @last

insert into dbo.sometable (surogate_ID, Name)
select surogate_ID, Name
from #inserted

COMMIT TRANSACTION



10/6/2009 9:30 PM | Miha Sedej
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Miha, That code will cause all inserts to be serialized in read-committed mode - inserts will block each other. It's therefore unsuitable for most multi-user scenarios.
10/6/2009 9:41 PM | David
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Dan Halliday
You are technically correct sir, which is the best kind of correct :)
Semi intelligent, I think that is the problem but you have to have some kind of standard externally verifiable at some point.

@Mladen
Prefer isn't always, but as you see this is heating up to an "us or them" kind of discussion.

Your assumption that natural keys are always mutable and subject to change is the problem with your premise.
Poorly defined natural keys are subject to change i.e. first name last name to identify a person.
This is usually due to not having a complete understanding of the data or what is being modeled.

Even basing your primary key on say ISBN it should be understood that since it is defined by an outside entity it may be subject to change. There was a time before ISBN and as we are seeing it is changing as it grows.

It is difficult to build relational data models that are resilient to change on something as fundamental as the primary key without relying on something like a generated surrogate key.

Again,
That's why I say relational isn't always the best path for all things, but does pretty well for most.
Heck, if it was easy to deal with I wouldn't have a job!

-wes
10/6/2009 10:15 PM | Wes Brown
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@Wes:
Trust me it won't turn into us vs them kind of debate. i won't let it.... hopefully :))
however i don't agree with your immutability concept. the basic nature of natural key, no matter how correctly defined, is that it's mutable. it could be for resons we can't even think of right now. read my answer to Adam a bit above.
Relational has it's place as well as KeyValue. They both solve it's own problems. IMO they are complementary technologies and not competitive ones.

@Miha:
you should be aware that that UPDATE thing you're using isn't reliableif you care about order in any way. if you don't then ok...
also in read commited isolation level you can still get duplicates for concurrent inserts between the
select from dbo.sometable
and
insert into dbo.sometable

@David:
My title does not imply anything. It says what i prefer, nothing else. you can have only one primary key, but you can also have as many candidate keys as you wish. so yes they can live together just fine. i do that too :)
10/6/2009 11:32 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Mladen: "My title does not imply anything. It says what i prefer, nothing else." But surrogate and natural keys have different purposes and achieve different things. Just saying you "prefer" one seems a bit like saying "I prefer addition instead of division" - both are necessary and useful so what's the meaning of having a preference at all?
10/7/2009 7:57 AM | David
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@David:
IMO both serve the exact same purpose any key does. to uniquely identify the row. that's it. they're just doing it with different data.
10/7/2009 2:57 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

A natural key implements a business rule. For example that an invoice number or user name cannot be duplicated. A surrogate key does not. If you have two hypothetical tables, one with a natural key and one without, then they are clearly not logically equivalent and certainly don't achieve the same purpose - one allows duplicates which the other does not. So it's a fallacy to think you are choosing between two suitable designs. You need to decide what your business requirement is and then the choice is made for you.
10/7/2009 3:12 PM | David
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@David:
true. but that's a different issue than just a row identifier. not any less important though. good points, man!
10/7/2009 3:44 PM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Mladen,

Your next article should be on merits of using/not using NULLs...


-ec
10/7/2009 9:54 PM | eyechart
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

@eyechart:
hahahaha! i think i've already done that a while back. it also sparked some interesting discussions.
10/8/2009 11:17 AM | Mladen
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

It may be worth knowing that in the US, social security numbers are not perfectly unique. They are only unique for the range of time between being issued and the individual's death, and after that they're open for reuse. Using SSN alone as a key is 100% guaranteed to bite you... years later.
10/9/2009 9:29 PM | Anonymous
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

actually, the SSA has stated that SSNs do not get re-used upon death. http://www.ssa.gov/history/hfaq.html


-ec
10/12/2009 7:39 AM | eyechart
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

> SELECT @bar = COALESCE(MAX(bar), 0) + 1
> FROM dbo.mort WITH (UPDLOCK);
This breaks if you don't use HOLDLOCK. See http://weblogs.sqlteam.com/dang/archive/2007/10/28/Conditional-INSERTUPDATE-Race-Condition.aspx

As for alleviating concurrency issues with the Max() + 1 method, how about the following solution? This is (I believe) foolproof in never delivering a duplicate value no matter what your setup, at the small cost of an extra "bookkeeping" table. You could even combine many table's key dispensers into one table, but that may have a concurrency/locking cost.

CREATE TABLE MyDataKey (
LastID int NOT NULL,
OnlyOneRow tinyint CONSTRAINT CK_MyDataKey_OnlyOneRow CHECK (OnlyOneRow = 0)
)
GO
INSERT MyDataKey VALUES (0, 0)
GO
CREATE TABLE MyData (
MyDataID int NOT NULL CONSTRAINT PK_MyData PRIMARY KEY CLUSTERED,
TheData varchar(100)
)
GO
CREATE PROCEDURE MyDataNextID(@NextID int OUT, @Quantity tinyint = 1)
AS
UPDATE MyDataKey
SET @NextID = LastID, LastID = LastID + @Quantity
RETURN @NextID
GO
DECLARE @NextID int
EXEC MyDataNextID @NextID OUT
INSERT MyData VALUES (@NextID, 'The first row')
SELECT * FROM MyData
10/24/2009 2:19 AM | Emtucifor
Gravatar

# re: Why I prefer surrogate keys instead of natural keys in database design

Please excuse the stray "RETURN @NextID" in my procedure above. I was sure it wouldn't work, but I tried things as a function first...

The proper way for returning a value from a Stored Procedure is using an OUTPUT variable.
10/24/2009 2:46 AM | Emtucifor
Gravatar

# If you do not understand RDB theory use surrogate keys

Case 1Natural Key Fail Case 1:

..The company got acquired by another company .....All this wouldn’t have happened if they had used surrogate keys.

If two companies are merging they have to merge there CRMdata. They have to decide what is a that make customer unique. That should be the PK. If you use surrogate keys you are going to think that one customer is actually two customers. You are going to make to sell calls to them.
>the whole application had to be modified
But that is just bad design in an other layer.

Natural Key Fail Case 2:
..So we had to modify the TaxId to European standards
yep you had to new data that had to be stored with each company. Databases handle that well. Most db handlad cascading changes of pk.
...I know of at least one company that went out of business because of this change
I think that had to do with that they did not invest any money in a good dba. I mean there is 1:1 relationship between new and old key. They have several years in planning the change. How hard can it be?

I understand that surrogate keys have to be used in starscheme, extrem large databases with to small servers. I also think its fair to use them when you are not using the RDB as a RDB justa as storage heap or as an OODB wich is very common.

But I think it is overused now. Its like: I do not understand the data. We do not have unique rows in the RDB. Lets solve with an identity column.
I came from DW and OO-programming where no one heard of natural keys. But after going old school and reading Celkos article
http://intelligent-enterprise.informationweek.com/showArticle.jhtml;jsessionid=2PDPOJUL30WEBQE1GHPCKH4ATMY32JVN?articleID=201806814
I have started to use natural keys as often I can.
1/1/2010 4:30 PM | Adversus
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET