Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

An INT primary key …. yet not an Identity?

Ah, this is not an anti-identity rant, don't worry!

Though, in a round-a-bout sort of way, it is yet another argument against always blindly using them – but not in the way you might expect.

There is a simple rule I'd like to propose, let me know what you think:

"If your client code, SQL code, or configuration files reference the primary key column of a table to determine any application logic, that primary key column should not be an identity."

Now, I am not saying that primary key column can't be a meaningless integer.  I am just saying it should not be an auto-generated meaningless integer.

Working with an inherited web application (in "classic" ASP …. yuck), there were lots of tables like this:

groupID, groupName
1, Administrator
2, User
3, Finance User
5, Account Manager
8, ReadOnly

and there was lots of code written like this:

if groupID = 1 then show delete button
if groupID =3 or groupID=5 and groupID != 8 then show the create new project button

... repeated everywhere, throughout the application and in the SQL code.   Yet, the groupID column was an identity.   I discussed a similar issue here, but this is a slightly different approach.   In this case, let us assume that we really want to -- or need to -- embed that "groupID" value everywhere in our code.  And we want that table to have an integer primary key to keep things short and simple and precise in our database code.  That's all fine, it happens, and it certainly is common.

But why does that groupID need to be an identity?

For static, non-changing system tables, why assign an identity primary key? Why not just make it an integer?  And once a year or so when someone adds and entry to this table (usually manually, behind the scenes with an INSERT command or opening the table directly) what is so bad about having them manually "figure out" the next ID to use and manually insert it?

The database will always have that primary key constraint on the column -- we don't gain any integrity by using an identity.  We just gain ... well, what exactly do we gain when the table is static?

The advantage of not using an identity is this: if we wish to confer meaning on these ID values, we have complete control over them.  Moving from one system to the next means we can just copy them over and we don't need to worry about setting IDENTITY INSERT ON and things like that.   We don't have to worry about gaps in the sequence or re-seeding the tables.  We can guarantee that we have complete control over these values, and they never will be created anywhere unless we explicitly create them. 

If we do use an identity, that's great the next "ID" will pop up magically, but what good does that really do for us in this scenario?  I understand that when an order is generated, or a customer is added, or for things like that, an IDENTITY is the way to go.  But for static tables, I really honestly don't bother with an identity.  Just put an INT primary key on there and you are good to go.  

The reason I bring this up is as the result of a mini-debate with another developer.  If a system table such as "Status" will only ever have 3 values, and we wish to assign each of those values an integer "ID" as a primary key, why bother with an identity?  Why not just use a standard, non-auto-generating integer column as the primary key?

Do the pros of an identity in this situation outweigh the cons?

Then again, I am a bit of a control freak, I admit ... But I think that key values in your data tables that your applications and systems rely on are a pretty good thing to have control over.

(Another minor point oft-overlooked:  Don't forget about considering the use of tinyint or smallint data types for these columns as well. )

Legacy Comments


Marco Mariani
2008-02-21
re: An INT primary key .... yet not an Identity?
I don't understand the obsession with always serial, or always integer, primary keys when there are obvious simpler ways to work.
Does it have to do with premature optimization? "an integer is smaller, thus is faster"..

if group_code == 'admin'...

if group_code in ('finance','account') and group_code != 'readonly'...

It's far more readable code and more readable table contents, too.

But... most people like simple rules they can apply everywhere.
Unfortunately, simple rules (as opposed to rules of thumb and common sense) generate complex software.

André Cardoso
2008-02-22
re: An INT primary key .... yet not an Identity?
I agree with the non use of an identity column in this situation. Besides the problems with the gaps, to maitain consistency across multiple environments, one has to effectively turn off identity (identity_insert on), in order to get the same Ids.
The problem of using the ints, and assuming it's a modern development language, is solved using constants, enums, or whatever and not the hard-coded value.

As to using string constants, they only generate trouble. If they are the constant and description ('finance'), it's really hard when the client wants another description shown on the application. If it's an abbreviation 'P' pending, after some time, unless there is a table that maps the abbreviation to the description, it starts getting confusing: was 'P' pending or processing? Does 'E' mean end or error? And if it's not documented and used with differente meanings, it's a nightmare...

Marco Mariani
2008-02-22
re: An INT primary key .... yet not an Identity?
> The problem of using the ints, and assuming it's a modern development language, is solved using constants, enums, or whatever and not the hard-coded value.

That would unnecessarily couple the enumerated values with the data in the database.

> As to using string constants, they only generate trouble.

I don't see how. They are the constants, and might serve as a localization key. I also don't see the need for one or two letters abbreviations, if the language has a decent support for strings (no strcmp thanks).

The only liability I can see with strings as keys is if there's a typo in the source, i.e. 'amdin' instead of 'admin' (*). But then, I unittest and usually read the code I write more than once. My customs may not be enterprisey enough, but they served me (and my customer) well :-)

(*) Overriding the equality comparison to trigger an error at runtime can take care of that.


Sheeri K. Cabral
2008-02-23
re: An INT primary key .... yet not an Identity?
Well, there is the fact that strings are bigger than INTs. I'll also note that sometimes you want non-sequential numbers, perhaps for security purposes. If you have an ID field and it's used in a GET request, you don't want people being able to find random invoices/user profiles/whatever.

Also sometimes it's silly to use an identity for something like a group when you always want the text, or if the text changes it means the identity itself has changed. I end up using foreign keys to a static table that has only text, ie, foreign key from a employee table, column "department" to the department table, and the column is text. That way you never have to join to figure out the department, you just have the write take a bit longer (because it has to look up a foreign key that's a string, not a number).

Note that something like "username", from a user perspective, is unique, and if you change your handle/username on the internet, you're a different person. It's useful from a corporate perspective to know that "sheeri" changed her handle to "awfief", but from a community perspective they're different people unless there's information that says otherwise.

Marco Mariani
2008-02-23
re: An INT primary key .... yet not an Identity?
> Well, there is the fact that strings are bigger than INTs.

We're talking about tables related to entities like user groups, company departments, and the like, in an OLTP context. I'd love to see some benchmark showing what can be gained on a modern DB by using integer pkeys versus short strings. It's not like those table have 1 million rows.

> I'll also note that sometimes you want non-sequential numbers, perhaps for security purposes. If you have an ID field and it's used in a GET request, you don't want people being able to find random invoices/user profiles/whatever.

If changing the URL bypasses security checks, we have bigger problems. We might as well use &admin=true like I've seen on thedailywtf.

> I end up using foreign keys to a static table that has only text, ie, foreign key from a employee table, column "department" to the department table, and the column is text.

That's my point, readabilty counts. Association tables with id_this, id_that, id_other and id_somemore make as much sense as VM opcodes.

That said, the average business application is quite different from a social network. Volatile user groups, tag clouds and a million users add another kind of complexity that I'm not familiar with.

G. Milner
2008-02-24
re: An INT primary key .... yet not an Identity?
My own personal law of primary keys or unique values in tables is this:

"Any unique value (used for a primary key) will always drift toward meaninglessness."

You may think you want to exert control over it now or in the future but the ability to do so will not last. For example, you may want the unique column to hold a number that specifies Customer-Division-Region in the column so it is something like this:

10010414

Customer = 1001
Region = 02
Diviision = 14

Eventually, the organization will change the setup of something by putting Region 02 into, say, division 13 and your unique IDs will become randomized.

This is just one example and I know that you would instead, in a situation like this, want to just have separate Division and Region columns and perhaps set your PK to the three columns, but the rule still holds: Single unique identifiers must drift toward (or start out having the condition of) meaninglessness.

There for there is no reason for your PK field not to be random or for it to maintain sequence.

Also, of course, integers are smaller than strings and index better....

So use and IDENTITY.

Cheers,

G. Milner


Marco Mariani
2008-02-25
re: An INT primary key .... yet not an Identity?

> "Any unique value (used for a primary key) will always drift toward meaninglessness."

You see, I happen to like data that has a meaning.

For instance, if I decide my application will stick a number to identify users (as opposed to, say, a mail address), the (pseudorandom, auto-incrementing or whatever) number becomes part of the process, and is a natural key.
A personal number in a tennis club's card, is without doubt a natural key even though it might have been generated by a database.

> This is just one example and I know that you would instead, in a situation like this, want to just have separate Division and Region columns and perhaps set your PK to the three columns,

Most probably, yes. I would have got two meaningful columns for free. And they probably would come in handy with other relations, or there would not be a hierarchy at all.

> Also, of course, integers are smaller than strings and index better....

Except, to maintain integrity and avoid duplications, you need a lot more unique indexes, and also more joins in your average query. Or simply leave data integrity at the mercy of the application(s). Which might even be fine when the application cardinality is exactly one.

> So use and IDENTITY.

As I said above, people like simple rules. "One entity, one table"


Jeff
2008-02-25
re: An INT primary key .... yet not an Identity?
G. Milner -- well, I appreciate your feedback but I've gotta say I literally disagree with almost every word and scenario that you wrote. You give a really nice "rule" and a nice conclusion with absolutely no evidence or examples or anything to back it up. What you wrote can be summarized like this: "If you concatenate existing columns together to form a primary key, that is a bad design, therefore always use meaningless primary keys". Not the strongest argument. Feel free to explain further.

Jeff
2008-02-25
re: An INT primary key .... yet not an Identity?
Marco -- great stuff, thanks for your feedback!

Marco Mariani
2008-02-25
re: An INT primary key .... yet not an Identity?
By the way, to point out the can of worms we're into.

There is a nice article + comment thread Here.

It's gone on for 6 years, with references to alien abduction and death threats to Jeff Smith for proposing natural keys ;-)


Jeff
2008-02-25
re: An INT primary key .... yet not an Identity?
ha, good link, Marco, that's "old school".

Warning to anyone who follows that link: Only read if you have a LOT of time on your hands!

Marco Mariani
2008-02-25
re: An INT primary key .... yet not an Identity?
> Marco -- great stuff, thanks for your feedback!

I'm flattered..

One more thing, since the performance aspect is brought up quite often.

Apart from the huge amount of blog and forum posts that often repeat the same arguments on both sides (*), I've found in The Art Of SQL (**) there are several optimizations that the planner cannot find in case of surrogate keys. Although I cannot point out the exact pages right now, YMMV, and the whole talk about planner makes little sense in context of MySQL. MySQL is a peculiar DBMS, they'd rather build ten more back-end engines than writing a good planner.


(*) see for instance
http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-i-7327
http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-ii-7345
http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-iii-7365

but don't google for Joe Celko in the groups if you are averagely touchy and don't like name-calling.