Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

Sometimes the problem isn't the code. It's the specs.

I thought I'd take a few minutes to discuss a general situation we see quite often in the programming world, using a simple T-SQL example.

Let's say we have created a stored procedure that returns Customers from a database which allows you to optionally specify a particular Region to filter by.  Typically, you would create an optional @Region parameter with a default value of NULL to accommodate this.  Thus, in your stored procedure, if the @Region parameter is null, you do not filter the results, but if not, you use the value passed in to filter the results where the Region column equals the @Region passed in. Now, this is pretty common, and it usually works well, and it is a pretty good convention to follow. 

However, consider this:

What if your Region column is nullable, and therefore NULL values are in your table?


This seems to come up quite a bit in the SQLTeam forums (great example here), and like many of the questions raised there, people look for a programming solution for their problem.  They think: there must be a way to fix my code to make it work, because right now it's broken! The Null values are never returned in my results!  How do I fix this? My code is fine, is there a service pack that will fix this bug?

What makes this issue particularly difficult for many programmers to solve is this: It is not a coding error.  It's a logic error in the specifications. Something isn't making sense if you are allowing NULL values in these columns and also allowing a value of NULL in a filtering parameter to indicate something other than "return NULL values only." 

When you have NULLs in the Region column , consider the filtering possibilities:

  • If you want all customers for region X, you pass in @Region='X'.
  • If you want all customers regardless of region, you pass in @Region = NULL.
  • If you want all customers with a NULL region, you pass in .... ???
Putting aside the code for a minute, we can now see the issue:  How do you indicate that we want just customers with a NULL region? You can't do it based on your specifications, because there is no way to do it!  Your specifications do not work in your environment.

So, the solution does not involve programming, it involves examining the specification and the environment to ensure that they are logically compatible.  If you have nullable data in a column like this, you must come up with another convention to use for these optional filtering parameters; not because you wrote the code wrong, or SQL isn't "working", but because very simply the specifications you are trying to implement will not work with the data you have to return the results you want.  It's as simple as that.

To solve this, you need to first recognize that the code is not the problem, and then either:

(a) come up with an alternate filtering specification; i.e., change the specification to fit your environment

or

(b) edit your schema to disallow NULLs in that column; i.e., change your environment to fit the specification
 
Which you choose is up to you, there are of course pros and cons to both approaches, but the key is this: One of those two must change, and you can't write a single line of code to "fix" anything until it does! 

This happens very often as programmers sometimes forget that before they can write the code to accomplish task X in a particular manner, they must first determine if the task is something that they can logically accomplish in your current environment.  That first step has nothing to do with writing code, it has everything to do with thinking about what you are trying to accomplish, coming up with an algorithm, maybe drawing a quick diagram or flowchart, and doing what you can to ensure that what you are about to code up will work.  Because if you can't write it out in English (or whatever your native language is) you certainly won't be able to write it in T-SQL !

We also commonly see this when people want to return DISTINCT results in SQL:

"I want to return only distinct States, but if I do, SQL won't let me return the City column as well.  What is the SQL code to accomplish this?"

Once again, we have a logical error. If we want to return 1 row per State, then our result will have 50 rows, one per state.  But how many cities are there?  Well, definitely more than 50.  So, how can we return only 50 rows but still return all of the cities as well?  The answer is -- you can't!  It cannot be done as specified, unless you summarize the City column in some way -- either taking the city for each State with the largest population, the first alphabetically, the capital, or some other factor.  Or, if you really want to show them all, you could concatenate them all together into a long list in one large column, but again, the key is something needs to be specifically stated and this problem needs to be solved in english or on paper before you can write the code to do it.  The specification to "display distinct states but also all cities" simply does not logically make sense in the environment (a relational database) without making some changes.  We can change the specs ("return only the capital of each state")  or the environment (use a hierarchical database like XML instead) but we can't change the code before changing one of those two things.

So, before you begin coding up that new T-SQL stored procedure or C# method, be sure to stop and think to yourself:  Can I write down the logical steps that need to done, in plain English, before I write the code * ? And, if I do that, will it all make logical sense?  Will it work?  What can go wrong?   Finding these things out after you attempt to write the code just leads to frustration and confusion as you wonder: which is wrong, my code, my environment, or my specifications?  It could be any of the three, but always remember that if your specifications don't match your environment, it doesn't matter how well you can write code, it's simply not going to work.

* Of course, we all know that we actually should do this, but I do recognize that for simple algorithms just knowing that you could do it is often enough


see also:

Legacy Comments


Mladen
2007-03-14
re: It is the Code, the Environment, or the Specs ?
well put jeff, well put!

Chris Reeder
2007-03-14
re: It is the Code, the Environment, or the Specs ?
On the Null reference. I usually handle it like so:

WHERE RegionId = CASE WHERE @RegionId IS NULL THEN RegionId ELSE @RegionId END

Also on Identity values like this.

WHERE RegionId = CASE WHERE @RegionId < 1 THEN RegionId ELSE @RegionId END

I would like to know your thoughts on this...

Jeff
2007-03-14
re: It is the Code, the Environment, or the Specs ?
Chris -- you haven't addressed the issue discussed in the article; it's not about how to code it, it's that if you have NULL values in your Region column, there is no way to indicate "return NULL regions" using parameters if a NULL parameters indicates "don't filter by region". The entire point of the article isn't about writing code, it's about ensuring that the specifications you are using make sense in your environment. And, in this case, they don't. No amount of coding will circumvent this.

If we do focus on the coding and ignore the problem with the specs (i.e., by assuming that Region is never null), I always prefer to use boolean expressions rather than CASE expressions in WHERE clauses whenever possible. Performance is usually better and it is usually easier to read. In this case, it is simply:

WHERE (@Region is Null OR Region = @Region)

Jeff
2007-03-14
re: Is it the Code, the Environment, or the Specs?
Chris -- also, your code doesn't return any rows where Region is null, regardless of the parameters passed in. Region is never equal to itself if it is null.

Jon
2007-03-15
re: Is it the Code, the Environment, or the Specs?
Good post, I love seeing the

"SELECT SUM(SomeField), Blah1, Blah2, Blah3, City, Address1, Address2, Address3, Address4...FROM...
GROUP BY Blah1"

But why doesn't it work...O well I'll just end up groupping by Blah2, Blah3, City...all of them...ahh there we go now I get the right results...
well I think I do!

FYI Jeff, your blog doesn't show up correctly here..using IE6. Looks as though the menu comes up and its only after a scroll down that I actually see content?

Just an FYI.

Jeff
2007-03-15
re: Is it the Code, the Environment, or the Specs?
Thanks for pointing that out, Jon, I never noticed it in Firefox ... I finally figured out the problem ... I had been playing around with different skins here for a while to try to get things looking fine, I finally decided on a nice plain and simple one.

Jon
2007-03-15
re: Is it the Code, the Environment, or the Specs?
Yep plain and simple the way to go. Looks really good and clean now.

Jon

Jeff
2007-06-07
re: Is it the Code, the Environment, or the Specs?
At the risk of creating a circular reference and blowing up the internet, here's another example:

http://sqlteam.com/forums/topic.asp?topic_id=82182

Notice how many people are trying to provide code to make things work, but the requirements mean it will NEVER work because they don't make logical sense.

Pau
2008-10-29
re: Sometimes the problem isn't the code. It's the specs.
Hello Jeff,

first of all sorry about my english, i'm agree with you, but there's a solution for the nullable fields, is the following (of course there's an implicit logic):
I took the region example:

SELECT * FROM CUSTOMERS
WHERE COALESCE(Region,0) = COALESCE(@Region,Region,0) . I think theorically this solves the problem.

Pau

jeff
2008-10-29
re: Sometimes the problem isn't the code. It's the specs.
Pau -- no, it doesn't solve the problem in any way. The problem is a logic error, not a coding error. You are trying to write code that makes 1=2, which is not possible.

Pau
2008-10-30
re: Sometimes the problem isn't the code. It's the specs.
Hello Jeff,

Have you test it? with this code if @Region is null will return ALL the records (even those with null values), if not it will return only the records that matches @Region. This is the implicit logic and is the tipical wanted behavior, because with
"@Region is Null OR Region = @Region" you always get the Regions with null values which usually don't want when the filter @Region is not null.

Excuse me to insist in something that looks so clear to you.

Thanks.
Pau.

jeff
2008-10-30
re: Sometimes the problem isn't the code. It's the specs.
Pau -- you are missing quite a few things here.

First, the expression:

@Region is Null OR Region = @Region

does *not* return NULL regions if @Region is not null. It only returns NULL regions, and in fact ALL regions, if @Region *is* null. Did you try it? Do you follow the logic?

If Region is null and @region is '1', then the first part of the OR is FALSE, so all we have left is the second part where is "Where Region=@Region".

But, again, I don't know how many more times I can repeat this: THE PROBLEM IS NOT CODE -- IT IS THE SPEC. I will suggest simply reading the article again more carefully if you haven't already, but to make it easier I will copy and paste the primary issue here:

>>When you have NULLs in the Region column , consider the filtering possibilities:
>>
>> * If you want all customers for region X, you pass in @Region='X'.
>> * If you want all customers regardless of region, you pass in @Region = NULL.
>> * If you want all customers with a NULL region, you pass in .... ???
>>
>>Putting aside the code for a minute, we can now see the issue:
>>How do you indicate that we want just customers with a NULL region?

That is the issue, and the question that must be answered; the problem is that the SPECIFICATION doesn't handle all of the possibilities, the SPECIFICATION must change in order to write code that works if you want all 3 of those options to work.

I hope this makes sense.


Pau
2008-10-31
re: Sometimes the problem isn't the code. It's the specs.
Jeff,

i must say you are right, i took the first @Region for Region without at.

Bye.