As with any programming language, it is important in SQL to keep your code short, clear and concise. Here are two quick tips that I find are very helpful in obtaining this goal.
Tip 1: For any relatively complicated constant expression, always declare a variable
Consider the following:
select *
from Transactions
where TranDate >= {some long, complicated expression to determine the start date} and
TranDate < {some long, complicated expression to determine the end date}
This is one of the most common things that I see in the
SQLTeam forums, from both the people asking questions and the people giving answers. If the starting date and ending date range are
constants for the entire SELECT (i.e., they don't vary by row, they are based on the current date or some parameters passed to the stored procedure), then simply declare them as variables, set the values once, and reference those variables in your WHERE clause:
declare @start datetime, @end datetime
set @start = {some long, complicated expression to determine the start date}
set @end = {some long, complicated expression to determine the end date}
select * from Transactions
where TranDate >= @start and TranDate < @end
Now, you could argue that we just technically made the code longer. But, we have made our code clearer and much easier to read and debug! We can now print and/or analyze the @start and @end dates to make sure that our formulas are working, without repeatedly running the entire SELECT over and over and glancing at the results and trying to determine if they "look right".
We can also potentially use these variables to make these formulas (or others) much simpler. For example, if the end date is always one day later than the start date, we can write:
declare @start datetime, @end datetime
set @start = {some long, complicated expression to determine the start date}
set @end = dateadd(day, 1, @start)
select * from Transactions
where TranDate >= @start and TranDate < @end
That is much, much easier to read, write and debug then if we repeated the long complicated expression twice, and it makes it very clear what the relationship is between the start date and the ending date. If we need to tweak our starting date formula, we can do it in one place and again it is easier to debug since we can just print out the value of the variable. We also have the ability to name our variables intelligently to help with the readability of our code.
Tip 2: To avoid repeating non-constant expressions, use a Derived Table or a Common Table Expression (CTE)
Another common thing found in lots of SQL code is to repeat the non-constant expressions over and over. For example, if we have a DateTime column called TranDate, this expression will round that date to the first day of the month:
dateadd(m, datediff(m,0, TranDate),0)
Using that knowledge, if we need to summarize transactions by month, the result often ends up looking something like this:
select
dateadd(m, datediff(m,0, TranDate),0) as [Month], sum(value)
from
table
where
dateadd(m, datediff(m,0, TranDate),0) >= @min and
dateadd(m, datediff(m,0, TranDate),0) <= @max
group by
dateadd(m, datediff(m,0, TranDate),0)
order by
dateadd(m, datediff(m,0, TranDate),0)
Notice that the date expression is repeated
five times in the SQL statement! Looks kind of silly, but I see it over and over again from both beginner and experienced programmers alike. The solution to simplifying this is to calculate the formula
once and assign it a meaningful alias
using a derived table:
select
[Month], sum(value)
from
(
select
dateadd(m, datediff(m,0, TranDate),0) as [Month], value
from
table
) x
where
x.[Month] >= @min and x.[Month] <= @max
group by
x.[Month]
order by
x.[Month]
Now the SELECT is much easier to read and maintain, and we can access the result of the calculation in the outer SELECT as many times as we need by referencing the "month" column returned by the derived table. Also, if our "month" formula is wrong, or needs tweaking or testing, it can all be done in one place and it doesn't need to be repeated over and over.
If you are using SQL Server version 2005 or greater, you can also use a Common Table Expression (CTE) in the same way. In fact, in many ways CTEs are even more readable and easier to work with than derived tables since the code doesn't require as much nesting and is more linear. Here's the previous example as a CTE:
with MonthSummary as
(
select
dateadd(m, datediff(m,0, TranDate),0) as [Month], value
from
table
)
select
[Month], sum(value)
from
MonthSummary
where
[Month] >= @min and [Month] <= @max
group by
[Month]
order by
[Month]
Repeating expressions is common for mathematical formulas as well, such as using CASE to avoid dreaded "divide by zero" errors when the denominator is an expression:
select
c as Numerator, a+b as Denominator,
case when a+b = 0 then null else c / (a+b) end as Result
from
tbl
Notice that a+b is repeated
three times in the previous SELECT statement. Again, using a derived table, we can easily change this to:
select
Numerator,
Denominator,
case when Denominator = 0 the null else Numerator / Denominator end as Result
from
(
select
c as Numerator, a+b as Denominator
from
tbl
) x
... and once again, we can now clearly label our expression and see exactly what it is and our code is much clearer and easier to maintain. This example is quite simple and the difference doesn't seem like much, but if that expression is long and complicated and used in other places as well, the improvement in the code can be tremendous.
Conclusion
It is not just readability that we are concerned with, it is also eliminating and finding bugs and optimizing code. Repeating expressions in more than one place, or embedding them deep into SQL statements where we can never really be sure they are working, can result in bugs that are tough to track down. It is also can force the optimizer to calculate these expressions over and over as well, when they should optimally only be calculated once.
So, remember: for constant expressions that don't change, just declare and set variables, and examine the variables' contents before using them anywhere to ensure that your expressions are working as you intend. For repeated expressions that are based on values in the tables you are querying, consider calculating them once within a Derived Table or a Common Table Expression. In addition, be sure to use meaningful variable names and aliases which will help to greatly improve the clarity and intentions of your code.
It all comes back to the "golden rule" of programming: If you need to cut and paste, you're probably doing it wrong!