Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Some SELECTs will never return 0 rows -- regardless of the criteria

In SQL, the general rule of thumb is that the number of rows returned from a SELECT will be zero if your criteria did not match any data.  However, there is an important exception to this rule: it does not apply when asking for aggregate calculations such as SUM(), MIN() or MAX(), without any grouping.  This is rather interesting and important to know and look out for, as it can cause some confusion and recently some of my ASP.NET code failed due to this.  Let's take a look.

We'll be using the following simple table for our examples:

create table tmp (f1 int, f2 int)

insert into tmp
select 1,1 union all
select 1,2 union all
select 2,3

Now, if we SELECT from this table without any criteria, we will get 3 rows returned:

select f1,f2 from tmp

f1          f2
----------- -----------
1           1
1           2
2           3

(3 row(s) affected)


If we put criteria on our SELECT that doesn't match any rows -- for example, where f2 > 3 -- we will get back zero rows:

select f1,f2 from tmp where f2 > 3

f1          f2
----------- -----------

(0 row(s) affected)

So far, so good.  Let's try adding a GROUP BY clause to this SELECT.  If we want just the the MAX(f2) per f1, we get back two rows:

select f1, max(f2) as MaxF2 from tmp group by f1

f1          MaxF2
----------- -----------
1           2
2           3

(2 row(s) affected)

Once again, let's constrain our results so that no data matches:

select f1, max(f2) as MaxF2 from tmp where f2 > 3 group by f1

f1          MaxF2
----------- -----------

(0 row(s) affected)


... and once again, no rows are returned.  This should all be expected, and when writing our client code, we can do simple checks to verify if any data matched our SELECT statement's criteria simply by checking the row count.  In the case of a SqlDataReader, the Read() method will return False immediately and we know that no rows were returned, so no data matched our criteria.

So, it seems that we have simply verified the following:  In a SELECT statement, if the WHERE clause results in no matching rows in the underlying table(s), zero rows are returned in the result set.  

However, there is an exception to this rule!   Let's get the SUM() of the f1 column from the entire table, without any criteria or grouping.  As expected, this will result in a single row returned with the total:

select sum(f1) as TotalF1 from tmp

TotalF1
-----------
4

(1 row(s) affected)

Next, let's filter the result, just as before, so that no rows match our criteria, and again ask for the SUM(f1).  Since no rows match our criteria, we expect to once again get 0 results returned.  However, notice what happens:

select sum(f1) as TotalF1 from tmp where f1 > 4

TotalF1
-----------
NULL

(1 row(s) affected)

A row is returned, with a Null value! Yet, there are *no* rows that match our criteria!  This is very interesting, and rather important to know.   Please note that there is a big difference between getting a single row back with a NULL value as opposed to getting no rows back at all, even though we might logically interpret the results the same way.   In fact, if you just are selecting only aggregate calculations from a table, without any grouping, one row will always be returned -- never more, never less  (note that this does not apply if you add a HAVING clause -- see below). 

As an example, say that you have a simple stored procedure like this:

create procedure GetPreviousWeek @CurrentWeek DateTime
as
select Max(Week) as PreviousWeek
from Calendar
where Week < @CurrentWeek

Written as is, it is important to remember that you will always get a single row back, even if there is no previous week before the @CurrentWeek parameter!  I had a similar stored procedure, and incorrectly wrote my client logic like this:

SqlDataReader r = cmd.ExecuteReader()

if (r.Read())
{
  // there is a previous week
}
else
{
  // there is not a previous week
}

That was based on the assumption that the WHERE clause would result in no rows being returned if there was no previous week and therefore no matching data.  However, since the stored procedure was only calculating and returning an aggregate total, a single row is always returned, which meant that my logic was flawed.  What I really needed was something like this:

SqlDataReader r = cmd.ExecuteReader()

if (r.Read())
{
  if (r["PreviousWeek"] == dbNull.Value)
  {
    // there is no previous week
  }
  else
  {
    // there IS a previous week value
  }
}
else
{
  // this should never occur!
}

Notice the difference in the logic.

In short, I think it makes sense:  If I ask for only an aggregate value for some data, even if that data has no rows, I should always get a value back -- though, of course, it may be NULL.  However, if I specifically ask for a set of rows (by not aggregating or by providing a GROUP BY clause) then I should expect that I might not receive any rows back at all if no data matches my criteria.

Consider a "real world" example: If I asked you for the total population of all U.S. states that start with "Z", you would be able to reply with a single value (NULL, or perhaps 0).  But, if I ask you to list all of the states that start with "Z" along with their individual population, the answer would be an empty set; you would not be able to give me a value or set of values, you would reply that "no states start with Z so there is no data to return."

Please note that adding a HAVING clause to filter your SELECT may indeed result in zero rows being returned, since the HAVING is applied after the aggregate is calculated.  For example:

select sum(f1) as TotalF1 from tmp having sum(f1) > 10

TotalF1
-----------

(0 row(s) affected)

In that case, the single value is calculated, but it is not returned in the final results because the HAVING clause prevented it.

So, keep this in mind when returning single, aggregate values in a SELECT:  You will always get a single row returned, and never an "empty set", unless you also provide a HAVING clause filter.  If you have any columns in your GROUP clause, however, this does not apply and you may indeed get an empty set returned if your criteria does not match any data.

see also:

Print | posted on Tuesday, November 13, 2007 11:11 AM | Filed Under [ T-SQL GROUP BY ]

Feedback

Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

I think the NULL result when you select SUM(field) over zero rows is a BUG, and the result ought to be 0.

In addition to the logic behind 0, there is precedent in computer languages that preceed SQL, most notably Lisp:

(+ 2 3) => 5

(+ 2) => 2

(+ ) => 0
11/13/2007 12:30 PM | Bob Bane
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

>> I think the NULL result when you select SUM(field) over zero rows is a BUG, and the result ought to be 0.

Absolutely not a bug, the sum() of nothing is NULL; only the sum of known values that explicitly add up to 0 should be zero. If you want to explicitly cast NULL as 0, you are certainly free to do so, but 0 should not be returned.
11/13/2007 1:48 PM | Jeff
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

btw i use SqlDataReader.HasRows to check for rows... just a pointer :)
11/13/2007 3:18 PM | Mladen
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

Mladen -- Don't you still have to call Read() to get each row? What's the advantage of using both HasRows and Read(), as opposed to just using Read() ?
11/13/2007 3:28 PM | Jeff
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

HasRows gets set on the TDS level and it doesn't do Read(); so it also doesn't consume a row.
it's usefull if you want to have some logic before you acctually start reading data depending on if the data exist.
something like:
SqlDataReader rdr = cmd.ExecuteReader();
if (!rdr.HasRows)
{
// do some stuff here
return
}
else
{
while (rdr.Read())
}
11/13/2007 4:19 PM | Mladen
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

select sum(f1) as TotalF1 from tmp where f1 > 4 AND sum(f1) IS NOT NULL
11/13/2007 10:37 PM | Mark My Words
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

>> select sum(f1) as TotalF1 from tmp where f1 > 4 AND sum(f1) IS NOT NULL

Not only is that syntax not valid, but it completely misses the point of every word in the entire blog post. 0-for-2 ! :)
11/14/2007 10:40 AM | Jeff
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

It makes sense from a mathematical perspective. When you add an unknown number of values, you have to get a value. Such things as Sum, Max, Min imply a value. You can't have a Sum(any number of rows) not equal to anything. It doesn't make sense but still does make sense. I guess it takes a db logic to understand?
11/16/2007 2:38 AM | Matt G
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

This is a bug encountered in my production environment with nested selects used for filtering purposes.

Select distinct
( Select * from Addresses where name like '%a%')
As Name


This is a very simple example (not practical but other examples exist where it is...) that shows an empty set returning as a null row. Needless to say this is very annoying and causes extra code (and headaches) to be created. I still haven't found a solution to this.
11/19/2007 3:06 PM | pete
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

Pete -- Why don't you post an actual example that compiles and executes and we can help you with a solution. What you've posted isn't valid, since you are returning multiple columns in a correlated subquery and also potentially multiple rows. Also, remember that

select (select ... from ...) as value

is always going to return a single row with a single value, even if that inner select returns no rows -- it will just result in NULL. this is not a bug, but exactly how SELECT statements and correlated subqueries are supposed to work. If you want to get no rows back if there is no matching values from your subquery call, use either a JOIN (which is 99% of the time better SQL anyway) or add a simple WHERE clause to filter outer NULL values.
11/30/2007 5:48 PM | Jeff
Gravatar

# re: Some SELECTs will never return 0 rows -- regardless of the criteria

I make use of the null return to simplify decode functions. Normal coding would be to select a description for a code. If an invalid code is passed to the decode function an error is raised an must be handled. I simply select the max(description), which never raises an error. If needed I can also return a custom message for invalid codes by selecting isnull(max(description),'code not found').
12/19/2008 2:00 PM | Ken Kibbe
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET