I thought I'd take a few minutes to discuss a general situation we see quite often in the programming world, using a simple T-SQL example.
Let's say we have created a stored procedure that returns Customers from a database which allows you to optionally specify a particular Region to filter by. Typically, you would create an optional @Region parameter with a default value of NULL to accommodate this. Thus, in your stored procedure, if the @Region parameter is null, you do not filter the results, but if not, you use the value passed in to filter the results where the Region column equals the @Region passed in. Now, this is pretty common, and it usually works well, and it is a pretty good convention to follow.
However, consider this:
What if your Region column is nullable, and therefore NULL values are in your table?
This seems to come up quite a bit in the SQLTeam forums (great example
here), and like many of the questions raised there, people look for a programming solution for their problem. They think:
there must be a way to fix my code to make it work, because right now it's broken! The Null values are never returned in my results! How do I fix this? My code is fine, is there a service pack that will fix this bug?
What makes this issue particularly difficult for many programmers to solve is this:
It is not a coding error. It's a logic error in the
specifications. Something isn't making sense if you are allowing NULL values in these columns and also allowing a value of NULL in a filtering parameter to indicate something
other than "return NULL values only."
When you have NULLs in the Region column , consider the filtering possibilities:
- If you want all customers for region X, you pass in @Region='X'.
- If you want all customers regardless of region, you pass in @Region = NULL.
- If you want all customers with a NULL region, you pass in .... ???
Putting aside the code for a minute, we can now see the issue: How do you indicate that we want just customers with a NULL region? You can't do it based on your specifications, because there is no way to do it! Your specifications do not work in your environment.
So, the
solution does not involve programming, it involves examining the specification and the environment to ensure that they are logically compatible. If you have nullable data in a column like this, you must come up with another convention to use for these optional filtering parameters; not because you wrote the code wrong, or SQL isn't "working", but because very simply the specifications you are trying to implement will not work with the data you have to return the results you want. It's as simple as that.
To solve this, you need to first recognize that the code is not the problem, and then either:
(a) come up with an alternate filtering specification; i.e., change the specification to fit your environment
or
(b) edit your schema to disallow NULLs in that column; i.e., change your environment to fit the specification
Which you choose is up to you, there are of course pros and cons to both approaches, but the key is this: One of those two
must change, and you can't write a single line of code to "fix" anything until it does!
This happens very often as programmers sometimes forget that before they can write the code to accomplish task X in a particular manner, they must first determine if the task is something that they can logically accomplish in your current environment. That first step has nothing to do with writing code, it has everything to do with thinking about what you are trying to accomplish, coming up with an algorithm, maybe drawing a quick diagram or flowchart, and doing what you can to ensure that what you are about to code up will work. Because if you can't write it out in English (or whatever your native language is) you certainly won't be able to write it in T-SQL !
We also commonly see this when people want to return DISTINCT results in SQL:
"I want to return only distinct States, but if I do, SQL won't let me return the City column as well. What is the SQL code to accomplish this?"
Once again, we have a logical error. If we want to return 1 row per State, then our result will have 50 rows, one per state. But how many cities are there? Well, definitely more than 50. So, how can we return only 50 rows but still return all of the cities as well? The answer is -- you can't! It cannot be done as specified, unless you summarize the City column in some way -- either taking the city for each State with the largest population, the first alphabetically, the capital, or some other factor. Or, if you really want to show them all, you could concatenate them all together into a long list in one large column, but again, the key is something needs to be specifically stated and this problem needs to be solved
in english or
on paper before you can write the code to do it. The specification to "display distinct states but also all cities" simply does not logically make sense in the environment (a relational database) without making some changes. We can change the specs ("return only the capital of each state") or the environment (use a hierarchical database like XML instead) but we can't change the code before changing one of those two things.
So, before you begin coding up that new T-SQL stored procedure or C# method, be sure to stop and think to yourself: Can I write down the logical steps that need to done, in plain English, before I write the code * ? And, if I do that, will it all make logical sense? Will it work? What can go wrong? Finding these things out
after you attempt to write the code just leads to frustration and confusion as you wonder: which is wrong, my code, my environment, or my specifications? It could be any of the three, but always remember that if your specifications don't match your environment, it doesn't matter how well you can write code, it's simply not going to work.
* Of course, we all know that we actually should do this, but I do recognize that for simple algorithms just knowing that you could do it is often enough
see also: