Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions

Often, we need to create a flexible stored procedure that returns data that is optionally filtered by some parameters. If you wish to apply a filter, you set the parameter to the necessary value, if not, you leave it null.  This is pretty standard stuff, of course, that we can write fairly easily, without the need for dynamic SQL, like this:

create procedure GetData
    @MinDate int = null,
    @MaxDate int = null,
    @MinAmount money = null,
    @MaxAmount money = null,
    @ProductCode varchar(200) = null,
    @CompanyID int
as

    select *
    from Data
    where
        (@MinDate is null OR @MinDate <= Date) and
        (@MaxDate is null OR @MaxDate >= Date) and
        (@MinAmount is null OR @MinAmount >= Amount) and
        (@MaxAmount is null OR @MaxAmount <= Amount) and
        (@ProductCode  is null OR ProductCode = @ProductCode) and
        (@CompanyID is null OR CompanyID = @CompanyID)

Note that we are using good boolean algebra as discussed here, and not using inefficient CASE or COALESCE() expressions around the columns in our WHERE clause.  It is pretty easy to read and quite logical, and certainly it is easy to edit and to extend with more filtering options as necessary.  We also made sure to use parenthesis in our WHERE clause to organize it and to enforce a clear and correct order of operations regarding our AND and OR clauses.

However, if we manipulate our parameters a little and re-organize our criteria, we can potentially write this much more efficiently.

First, note that we are allowing the user to specify an optional date range via the @MinDate and @MaxDate parameters.  If either is NULL, we simply do not use a minimum or maximum date in our filter.  However, we can simplify things by writing our criteria like this:

select *
from Data
where
Date between coalesce(@MinDate, '1/1/1900') and coalesce(@MaxDate, '12/31/2999') and
... etc

All we need to do is use COALESCE() to transform our NULL values to dates that will encompass the entire range of values, thus "eliminating" that filter from the results. This greatly simplifies our WHERE clause and ensures that indexes on our Date column can be used. 

We can do the exact same thing for our Amount range as well:

    select *
    from Data
    where
        Date between coalesce(@MinDate, '1/1/1900') and coalesce(@MaxDate, '12/31/2999') and
        Amount between coalesce(@MinAmount,-99999999) and coalesce(@MaxAmount,99999999) and ...

For numeric values, this can be trickier because you really want to make sure that you have your entire range of values covered.  When in doubt, just use the smallest and largest possible values that your data type will allow.

Next, we have a string comparison based on product code.  These also can be very efficiently rewritten by using LIKE.  Normally, LIKE is much slower than using equals, but if you do not use any wild cards and it allows you to eliminate an "OR" in your WHERE clause, it may actually be more efficient:

    select *
    from Data
    where
        Date between coalesce(@MinDate, '1/1/1900') and coalesce(@MaxDate, '12/31/2999') and
        Amount between coalesce(@MinAmount,-999999999) and coalesce(@MaxAmount,999999999) and
        ProductCode like coalesce(@ProductCode,'%') and ...

Note that if your product codes contain symbols like % or _, this will not work for you.  But in general, using LIKE in this case can be really handy.  I also sometimes see this:

where (@Name is null OR Name like '%' + @Name + '%')

.. which can of course be written much simply and more efficiently, since we are using LIKE already, as:

where (Name like '%' + coalesce(@Name,'') + '%')

The idea is to eliminate those OR operators and simplify your WHERE condition wherever possible.

Finally, we have an optional filter for a specific @CompanyID integer parameter.  We don't want to use a LIKE wildcard, since this will require that all of the integer values in the table must be converted to strings for the comparison -- thus, no indexes will be used.  What we can do, even though it might not seem intuitive, is use a range!  For example:

    select *
    from Data
    where
        Date between coalesce(@MinDate, '1/1/1900') and coalesce(@MaxDate, '12/31/2999') and
        Amount between coalesce(@MinAmount,-99999999) and coalesce(@MaxAmount,99999999) and
        ProductCode like coalesce(@ProductCode,'%') and
        CustomerID between coalesce(@CustomerID,0) and coalesce(@CustomerID,999999999)

Now, we have no ORs in our WHERE clause at all, and we have not wrapped any columns in our table in any expressions, so all indexes can potentially be used.  Our SQL is still relatively clear and easy to read and work with, and we can easily add more conditions or criteria as necessary.

You may also find that it is more efficient and/or easier to maintain if you first set your parameter values as necessary before the SELECT, like this:

    declare @MinID int, @MaxID int


    set @MinDate = coalesce(@MinDate, '1/1/1900')
    set @MaxDate = coalesce(@MaxDate,'12/31/2999')
    set @MinAmount = coalesce(@MinAmount,-99999999)
    set @MaxAmount = coalesce(@MaxAmount,99999999)
    set @ProductCode = coalesce(@ProductCode,'%')
    set @MinID = coalesce(@CustomerID,-99999999)
    set @MaxID = coalesce(@CustomerID,99999999)

    select *
    from Data
    where
        Date between @MinDate and @MaxDate and
        Amount between @MinAmount and @MaxAmount and
        ProductCode like @ProductCode and
        CustomerID between @MinID and @MaxID

This will ensure that the coalesce() expressions are evaluated only once, and it does make things a little easier to read.

The overall trick is to think to yourself: Can I take a parameter value and either alter it or use it to create new values, and then efficiently use those new values in my WHERE clause instead of simply using that raw parameter value?  Sometimes, it takes some "outside-of-the-box" thinking, but it often can be done and the resulting performance gains can be tremendous.

Ultimately, you always should test the different ways of writing your WHERE clauses and always double check the indexing on your tables to get the maximum efficiency.  Writing things a particular way in one situation may work great, but then applying it to another may not work quite as well.

See also:

Legacy Comments


Mladen
2007-09-18
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
nice!

i usually just set the default values to parameters in the sproc, so i don't have to even use coalesce

Jeff
2007-09-18
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
An even better idea! I do that sometimes as well.

Dave
2007-09-18
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
I think this:

set @MinID = coalesce(@MinID,-99999999)
set @MaxID = coalesce(@MaxID,99999999)

Should be this:

set @MinID = coalesce(@CustomerID,-99999999)
set @MaxID = coalesce(@CustomerID,99999999)

Jeff
2007-09-18
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Fixed, thanks Dave!

gbn
2007-09-19
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
I use some of these techniques already, but this is good stuff.

To the posters above: Beware default parameters: parameter sniffing may occur.

Regina
2007-09-19
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
I've always been torn between using COALESCE vs. ISNULL. I like COALESCE because its ANSI compliant and can support more arguments, but I like ISNULL also because for this kind of thing its a bit clearer what is going on.

Also ISNULL has the apparent behavior of truncating the result to the size of the variable placeholder, for good or bad.

You know if there is any difference in speed between the 2?

Mladen
2007-09-19
re: Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
yes isnull is faster.
google up coalesce vs isnull speed

Chris Pietschmann
2007-09-19
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Great idea! Thanks!

Rashmi
2007-09-20
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Yup ! all this works well if the parameter has single value. But what if i have comma separated multiple values in my paramater? something like this:

select g.name from geography g
where (p_Country is null or g.id in (p_country))

p_country has multiple values :1,2,3,4,5

for such multiple cases, even the above code doesnt work. The in clause actually fails here.Anyone has any better idea to resolve this?

Thanks,
Rashmi

Mark Routbort
2007-09-20
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Ummm, I suggest trying this with some real-world data to see exactly how bad an idea this is.

In most cases, and as the procedure is written, procedure allow null input values such as @MinDate because the caller doesn't _care_ about the value of the Date column in the table, not that they are trying to retrieve rows with a null Date value. In other words, we don't want to seek against this predicate _at all_.

By using COALESCE or ISNULL to coerce the null input parameter to a fake real parameter, you force the parameter to be used in refining the rowset, when in fact you want it to be ignored completely (which is what the query engine will do - again, I suggest trying this against some real data).

In addition, it's inelegant, decreased readability, and promotes bug to coerce parameters to constants. In this case, your code will break starting 12/31/2999, because SQL Server datetime can actually go up to 12/31/9999. :-)

Jeff
2007-09-20
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Hi Mark -- thanks for your feedback.

First off, of course you don't use this when your dates contain nulls! I hope you didn't think I was presenting a "rule" that states "always write your WHERE clause like this" ! My entire point was to analyze your SQL to see if there might be a more efficient way to code it, depending on your needs. that's all. If you have nulls and want them returned in your data, of course you wouldn't use a date range! And, of course you wouldn't use a NULL value as a default parameter, either -- otherwise, how you do express to return NULL dates or not? You can't.

As for testing, did you test it? Care to give an example that demonstrates how "bad this idea is"? All of my testing shows that it is at worst the same, and more often more efficient than using lots of OR's, especially as other conditions are added or included in your WHERE clause. I could not come up with a scenario where it performed worse -- can you?

Finally, I will just assume that you are joking about the code breaking in year 2999 ... ... kind of hard to tell for sure based on the tone in the rest of your comments.

Jeff
2007-09-20
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Mark -- by the way, just for the record -- I prefer to write it the logically correct way with the OR conditions, it is just that often it ends up not generating a good execution plan, especially prior to SQL 2005. Sometimes, though, if you make little substitutions as I suggested, you may find that are able to tweak things to get better performance.

Mark Routbort
2007-09-20
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Jeff -

Yes, I did test this against a relatively cost-intensive stored procedure we have in a production system (SQL 2000, btw) where we use a similar approach to your initial example to allow optional parameters, which if unspecified do not constain the result set. This procedure nearly always is exercised with the optional parameters set to null. For this particular example, converting a fragment of my WHERE predicate from

( @start_date IS NULL or
acc2.created_date >= @start_date ) AND
( @end_date IS NULL or
acc2.created_date <= @end_date )

to

acc2.created_date >= ISNULL(@start_date,'1/1/1900')
AND
acc2.created_date <= ISNULL(@end_date,'12/31/9999')

resulted in more than doubling the Reads in SQL Profiler when executing the stored procedure. The deck is stacked, or so I thought, against the latter expression since as it turns out there is no index on the column [created_date] (as I said, in production this is really only run with unconstrained dates). This lead me to think that the query engine was smart enough to optimize out redundant OR conditions generally, and that by using ISNULL or COALESCE you are forcing an unecessary index seek or table scan to occur.

I also got horrific performance when I switched the order in my original predicate to

( acc2.created_date >= @start_date
OR @start_date IS NULL) AND
( acc2.created_date <= @end_date
OR @end_date IS NULL)

which again led me to believe that the query engine was performing some form of logic short-circuiting.

However, I was unable to get a working example to show you with a non-complex underlying table & procedure architecture, so I think things are more complex than I assumed.

I withdraw my objection to your suggestion on the basis of performance. I guess we would probably agree on starting with code that is elegant and readable, and then going on to optimization from there. I was in fact joking about needing to worry that your code will break in a thousand years, but depending on boundary literals will probably burn you sometime, and in my opinion, should only be used if you can demonstrate through tracing or execution plan analysis that it will help your performance.

Jeff
2007-09-21
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
>>I guess we would probably agree on starting with code that is elegant and readable, and then going on to optimization from there.

Absolutely, Mark! I definitely agree with that. I don't know if you follow it or saw it, but in this article I even linked to a much older post of mine that recommends the simple boolean method for implementing this kind of thing.

It is frustrating sometimes, because it is very hard to predict exactly what the optimizer will do. Sometimes switching the order of things in your criteria makes a difference, other times it doesn't, and other times you need to write your criteria in a way that you *think* is actually less elegant or efficient, but then performance improves dramatically. Then, you try that same technique again elsewhere, and performance suffers ... overall, I hope I was able to communicate that these are only suggestions and ideas and the overall point is to sometimes step back and think about different ways you can use your parameters to construct your WHERE clause to tweak things, if necessary, which may help improve performance.

Thanks for your follow up and your feedback!

AP
2007-09-23
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions

Just be careful when wrapping a function around an indexed column.... it will nullify the use of any index.

Jeff
2007-09-23
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
AP -- uh, thanks! That's kind of the entire point of my post ....

Richard
2007-09-24
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
There's a bug in the range filters in your original query. For the date and amount, you have:

(@Min is null OR @Min >= value) and
(@Max is null OR @Max <= value)

This will only return records where the value is less than the minimum and greater than the maximum. This should, of course, read:

(@Min is null OR @Min <= value) and
(@Max is null OR @Max >= value)

Jeff
2007-09-24
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Thanks, Richard! Fixed!

Ken Lee
2007-10-01
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Setting a variable to a value derived from a function that returns a constant value is very much preferable to putting the function in the where clause because the function will be evaluated for every record in the table. Putting the function in the where clause is OK when you know the table is very small or when the function uses data in the table to determine the value.

( acc2.created_date >= @start_date
OR @start_date IS NULL) AND...
gets bad performance because the evaluation goes from left to right so it always touches the table variable and then finds out the variable is null.

You can also have customized select statements based on the date:
IF (@start_date IS NOT NULL) SET @end_date = ISNULL(@end_date,99991231)
ELSE IF (@end_date IS NOT NULL) SET @start_date = ISNULL(@start_date,19000101)
IF (@start_date IS NOT NULL) -- Use SELECT statement where the date is between the two values
ELSE -- Use SELECT statement where the date isn't used in the where clause

As stated in the article that isn't as elegant or as easy to maintain when you have multiple fields optionally passed.

Chandy
2007-10-13
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Great post Jeff...
This is regarding Ken Lee's response. Is it good to use IF ELSE statements inorder to have customized SELECT statements. Would it not reduce the performance? Would it not affect the precompiled plan if used in a Stored Procedure. I have faced this problem a lot while using IF ELSEs to customize queries like that. What is your take on that?

Doug
2007-10-15
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Jeff,

Why not build your SQL dynamically and Execute the string? You have a minimal SQL operation even with the Execute penalty.


Jeff
2007-10-15
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Doug -- if you absolutely need to, sure, use dynamic sql. But standard static sql provides you with precompiled, prevalidated, preoptimized SQL statements, unlike building strings dynamically.

Jeff
2007-10-15
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Also, Doug: be sure to read

The Cursings and Blessings of Dynamic SQL here: http://www.sommarskog.se/dynamic_sql.html

It's really the definitive guide to dynamic sql, and outlines many of the pros and cons to be aware of when employing it.

P
2008-12-11
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
Re the following

Note that if your product codes contain symbols like % or _, this will not work for you. But in general, using LIKE in this case can be really handy. I also sometimes see this:


where (@Name is null OR Name like '%' + @Name + '%')


.. which can of course be written much simply and more efficiently, since we are using LIKE already, as:


where (Name like '%' + coalesce(@Name,'') + '%')

I would point out that if you have Null values in the name field then these will not be included in the second version of the statement. If nulls are allowed then you will have to stick to the first otherwise they will not be included in the resultset which I have just found out a little too late.

Rakesh Kumar
2009-07-01
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions

Hey Jeff !

Gr8 Post

Really it did solve me many of the problems where I used case and such OR conditions.

simply excellent

:)

Rakesh

Bertie Fraser
2010-05-07
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
This might be too late for Rashmi, but for anyone else having a similar problem with something like:
select g.name from geography g
where (p_Country is null or g.id in (p_country))

I suggest:
select g.name form geography g
where ((p_Country is not null) and g.id in p_country)
or
(p_County is null)

the reason is obvious from the solution, I hope.

David Reed
2010-10-07
re: Optimizing Conditional WHERE Clauses: Avoiding ORs and CASE Expressions
@Bertie: That makes no difference. The problem is the same with your code too. Your where clause prevents an index seek and forces an index scan.

I can't believe that the SQL Server team haven't sorted this out in over a decade. How hard can it be to parse the where clause for conditions that can be evaluated up front allowing the elimination of other conditions without referring to table data? It is my biggest gripe with SQL Server and makes the engine look incredibly naive.

I've appended a very simple example which best illustrates the problem. The query should have a cost of 0.003 and use an index seek no matter what but you can see that SQL Server gets stupid and increases it by a factor of 300 when you add the or clause, or by a factor of 100 if you use Jeff's workaround which allows the index seek to kick in but with a performance hit, or by a factor of 30 if you change Jeff's coalesces to isnulls.

2010 and this problem still hasn't been solved. We shouldn't be having to come up with workarounds for this it should be fixed, God knows they've had enough time. My guess is that people will still be posting the same comments on this blog in 2020.

declare @id int
set @id = 373776
select document_series_id from document where document_series_id = @id

/*
TotalSubtreeCost: 0.003
|--Index Seek(OBJECT:([Union2].[dbo].[Document].[IX_CEL_Document_Series_ID]), SEEK:([Union2].[dbo].[Document].[Document_Series_ID]=[@id]) ORDERED FORWARD)
*/

declare @id int
set @id = 373776
select document_series_id from document where (@id is null) or (@id is not null and document_series_id = @id)

/*
TotalSubtreeCost: 0.982
|--Index Scan(OBJECT:([Union2].[dbo].[Document].[IX_CEL_Document_Series_ID]), WHERE:([@id] IS NULL OR [@id] IS NOT NULL AND [Union2].[dbo].[Document].[Document_Series_ID]=[@id]))
*/

declare @id int
set @id = 373776
select document_series_id from document where document_series_id between coalesce(@id, -2147483648) and coalesce(@id, 2147483647)

/*
TotalSubtreeCost: 0.290
|--Index Seek(OBJECT:([Union2].[dbo].[Document].[IX_CEL_Document_Series_ID]), SEEK:([Union2].[dbo].[Document].[Document_Series_ID] >= isnull([@id],(-2147483648)) AND [Union2].[dbo].[Document].[Document_Series_ID] <= isnull([@id],(2147483647))) ORDERED FORWARD)
*/


declare @id int
set @id = 373776
select document_series_id from document where document_series_id between isnull(@id, -2147483648) and isnull(@id, 2147483647)

/*
TotalSubtreeCost: 0.090
|--Index Seek(OBJECT:([Union2].[dbo].[Document].[IX_CEL_Document_Series_ID]), SEEK:([Union2].[dbo].[Document].[Document_Series_ID] >= isnull([@id],(-2147483648)) AND [Union2].[dbo].[Document].[Document_Series_ID] <= isnull([@id],(2147483647))) ORDERED FORWARD)
*/