Generic T-SQL tips and tricks. Some of this applies to other dialects of SQL as well.
-- clean up any messes left over from before:
if OBJECT_ID('AllTeams') is not null
drop view AllTeams
if OBJECT_ID('Teams') is not null
drop table Teams
-- sample table:
create table Teams
id int primary key,
-- sample data:
insert into Teams (id, City, TeamName )
select 1,'Boston','Red Sox' union all
select 2,'New York','Yankees'
create view AllTeams
select * from Teams
select * from AllTeams
--id City TeamName
------------- -------------------- --------------------
--1 Boston Red Sox
--2 New York Yankees
-- Now, add a new column to the Teams table:
alter table Teams
add League varchar(10)
-- put some data in there:
-- run it again
select * from AllTeams
--id City TeamName
------------- -------------------- --------------------
--1 Boston Red Sox
--2 New York ...
Nothing earth-shattering here, I was just helping out a colleague with this so I thought I'd post up the example I gave him.
-- sample table:
create table People
Person varchar(1) primary key,
-- with some sample data:
insert into People
select 'A','Boston',23 union all -- odd #
select 'B','Boston',43 union all
select 'C','Boston',29 union all
select 'D','Chicago',15 union all -- single #
select 'E','NY',12 union all -- even #
select 'F','NY',55 union all
select 'G','NY',57 union all
-- here's our query, showing median age per city:
AVG(age) as MedianAge
select City, Person, Age,
ROW_NUMBER() over (partition by City order by Age...
I promise to get back to writing articles on a more regular basis soon, but in the meantime, here's a comment from Nathan A. on using DISTINCT and ORDER BY:
This is actually a problem I have been puzzling over for quite a while now. I actually need to do that sort. I wonder if I may have to create another column that has the list of ordering values in it in increasing order so for my example above assuming a letter table and a number table that contains the numbers for letters, the number table would not change but I...
Here's a simple SQL Server stored procedure that you can call to process an OLAP cube using T-SQL. The parameters should be self-explanatory. To me, this is a little easier and more flexible than processing using DTS packages.
create procedure ProcessCube
@Partition varchar(100) = null, -- If NULL, process the entire Cube
@Server varchar(100) = 'localhost'
Author: Jeff Smith
/* variables used to store object handles */
declare @o_svr int, @o_db int, @o_cube int, @o_part int, @o_mds int
declare @hr int
/* different cube processing options. This SP uses "default" */
declare @PROCESS_DEFAULT int
declare @PROCESS_FULL int
If the start date and end date are both week days, then the total number of week days in between is simply:
(total difference in days) - (total difference in weeks) * 2
DateDiff(dd, @start, @end) - DateDiff(ww, @start, @end)*2
... since the DateDiff() function with weeks returns the number of week "boundaries" that are crossed; i.e., the number of weekends.
If you have a table of holidays, then you can simply subtract them out as well:
DateDiff(dd, @start, @end) -
DateDiff(ww, @start, @end)*2 -
(select count(*) from holidays where holiday_date between @start and @end)
Now, what if the start day or the end day is...
A common mistake beginners make when working with SQL is trying to format their output at the database layer, as opposed to simply doing this at the presentation layer (i.e., client application, reporting tool, web page, etc). I've covered that quite a bit in various blog posts, but I've only touched upon another similar issue which I feel is equally as important and also commonly mishandled.
In the SqlTeam forums, I often see code that accepts DateTime input in the form of a string value (say, from a TextBox on a web form) and uploads that value to the database written...
Let's say you are called in to troubleshoot a stored procedure that is performing poorly.
You dive in to investigate and this is what you find:
create procedure ProcessProducts
declare @Products cursor, @ProductID int
set @Products = cursor for select ProductID from Products order by ProductID
fetch next from @Products into @ProductID
exec DoSomething @ProductID
fetch next from @Products into @ProductID
Ah ha! A cursor! It seems we have identified the bottleneck: Clearly, the performance problems are because the code is not doing things in a set-based manner, but rather...
Here's an obscure piece of SQL you may not be aware of: The "ALL" option when using a GROUP BY.
Consider the following table:
Create table Sales
SaleID int identity not null primary key,
insert into Sales (CustomerID, ProductID, SaleDate, Qty, Amount)
select 1,1,'2008-01-01',12,400 union all
select 1,2,'2008-02-25',6,2300 union all
select 1,1,'2008-03-02',23,610 union all
select 2,4,'2008-01-04',1,75 union all
select 2,2,'2008-02-18',52,5200 union all
select 3,2,'2008-03-09',99,2300 union all
select 3,1,'2008-04-19',3,4890 union all
SaleID CustomerID ProductID SaleDate Qty Amount
----------- ----------- ----------- ----------------------- ----------- ---------------------
9 1 1 2008-01-01 00:00:00.000 12 400.00
10 1 2 2008-02-25 00:00:00.000 6 2300.00
11 1 1 2008-03-02 00:00:00.000 23 610.00
12 2 ...
Everyone seems to want to "pivot" or "cross tab" data, but knowing how to do the opposite is equally important. In fact, I would argue that the skill of "unpivoting" data is more useful and more important and more relevant to a SQL programmer, since pivoting results in denormalized data, while unpivoting can transform non-normalized data into a normalized result set. We all know that there's lots of bad databases designs out there, so this can be a handy technique to know.
Of course, even a well designed, fully normalized database can still benefit from "unpivoting" from time to time,...
Microsoft Access has a pretty handy boolean property that you can set for any "text" column in your tables:
True means that empty strings ('') are allowed in the column, False means that they are not.
This is actually quite nice, because by using this along with the Required (i.e., disallow NULLS) property, you can ensure that your column has an actual, non-Null, non-Empty String value without the need for any additional constraints.
SQL Server does not have this property, but we can easily achieve the same effect by using a CHECK constraint:
create table foo
column1 varchar(100) not null check (column1 <>...
I got an email recently regarding one of my early blog posts from the olden days:
Steve Kass wrote about your post:"there is no guarantee that WHERE <filter 1> OR <filter 2> will be optimized so that the filters are evaluated in the order typed".
I am not certain that optimization changes the priority of the expressions, but I do...
As with any programming language, it is important in SQL to keep your code short, clear and concise. Here are two quick tips that I find are very helpful in obtaining this goal.
Let's take a look at another one of those stupid, arbitrary SQL Server error messages that Bill Gates clearly only created because Micro$oft is evil and incompetent and they want to annoy us (and probably kill baby squirrels, too):
Msg 145, Level 15, State 1, Line 4
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
This message pops up when you ask for DISTINCT rows for one set of columns, but you'd like to have the results ordered by one or more columns not specified in your distinct set. For some reason, SQL Server will not allow...
In SQL, the general rule of thumb is that the number of rows returned from a SELECT will be zero if your criteria did not match any data. However, there is an important exception to this rule: it does not apply when asking for aggregate calculations such as SUM(), MIN() or MAX(), without any grouping. read more...
Applying a Sub-Query, Joining a Derived Table ...
I think the easiest way to think of CROSS APPLY is that it is like doing a CROSS JOIN with a correlated sub-query instead of a derived table. Let's see if I can explain that .... A derived table is "self-contained", in that all tables and columns in the parent SELECT are not accessible (though variables and parameters can be referenced). For example, consider:
select A.*, b.X
cross join (select B.X from B where B.Val=A.Val) b
That is not legal because A.Val is out of scope within the derived table; this is because the derived...
If you read this blog on a regular basis, you probably know that a frequently covered topic is dates and times and the use of the DateTime data type. Well, along those same lines, my latest article has just been published over at SQL Team: Working with Time Spans and Durations in SQL Server.
From the Article:
What is the best way to return the "duration" of an event in SQL, given the start and end datetime values? How can we add up these durations to return grand totals? What data types should be used to return this data to our clients?...
Have you ever seen (or written) code like this:
select distinct(employeeID), salary
That compiles and executes without returning any errors. I've seen that attempted many times over the years, and of course people think DISTINCT is "broken" and "not working" because they see multiple rows for each employeeID. "But I asked for only distinct employeeIDs!" they say.
Well, the DISTINCT has nothing to do with the EmployeeID column; it is not a function that accepts arguments! It is just a tag that you can put after the word SELECT to indicate that you want only distinct combinations of all columns in the...
Be careful when mixing OUTER and INNER JOINS together in a single SELECT statement. The end result doesn't always seem to "work", and it can be tricky to understand exactly why and how to fix it without incurring additional unintended side effects. Here's an example that demonstrates some of the issues. read more...
A question I see very often in the SQLTeam forums is how to return data in a summarized form by concatenating a list of values into a single CSV column. This can be done fairly easily in T-SQL, but as the formatting and concatenation requirements becomes more elaborate, be sure to ask yourself: Am I forcing presentation code into the database layer? read more...
A web application I have inherited uses data from a stored procedure that returns two columns: Description and Override. Basically, for each item returned, the web page should display the Description unless an Override is provided, in which case the Override is displayed. This appears to be a simple case of using ISNULL() or COALESCE():
select coalesce(Override, Description) as Display
Unfortunately, it turns out that there is a mixture of NULL values and empty strings ('') in the Override column, so this doesn't work -- that expression will return '' instead of using the override because the coalesce() check for NULL...
When you need to summarize transactional data by Month, there are several ways to do it, some better than others. What to ultimately choose depends on your needs, but remember: Keep it short and simple in T-SQL, and always do all of your formatting at your presentation layer where it belongs. read more...
Every now and then I see T-SQL code written like this:
select somecolumn as 'columnname'
Notice that 'columnname', despite its appearance, is not a string literal or a string expression, it is the alias that we are assigning to the column somecolumn. Putting the name of an alias in single quotes in T-SQL is completely valid syntax, and it will work fine, but I feel that it is a really bad practice. It makes the distinction between string literals and object names very blurry, and it can lead to confusion when examining and maintaining your code.
For example, to me this is...
The next time you are working with dates and times, please remember: how would you handle things if you were working with integers and decimals? The same logic and reasoning applies. Be smart, let SQL do the work for you and use the right data types for the job, even if things don't always "look" right. read more...
I've written a two part article on using SQL GROUP BY clauses over at SQLTeam.com. read more...
A occasional question seen in the forums, which was just recently asked today, is:
"I know I can use
to get all of the columns from a table, but is there a way to write
SELECT * minus columnX
to get all of the columns except for certain ones?"
Now, my goal isn't to debate whether not "SELECT *" is bad or good or should be used or not. The fact of the matter is, people use it all the time because it is quicker and shorter than typing out all of the column names. People are lazy, right? ...
A quick follow up to the "Nested WHERE-IN" anti-pattern post from yesterday ... If you didn't get a chance, be sure to read the comments from that post as well, there are some great points in there so far. read more...
There's a fascinating technique that I see many beginners use when writing SQL statements, and I call it the "Nested WHERE-IN" anti-pattern. It is, unfortunately, a common SQL technique used to avoid JOINS at all costs. read more...
You're INSERTing multiple rows from an un-normalized import table. Each row is assigned an identity primary key. You know you can use scope_identity() to retrieve one identity at a time, but how do you retrieve a whole set of them? Are cursors the only answer? As usual, it depends on having logical specifications and a good design. read more..
SQL is a set-based language, and often we wish to pass sets of data as a parameter to a stored procedure. For example, you might wish to pass a set of Customers to an invoice generating stored procedure, or a set of employees for which you’d like to calculate a particular bonus. SQL Server already has everything you need to do this, and you don't need CSV or XML strings. read more...
Sometimes, it appears that a necessary solution to common SQL problems is to join a table to itself. While self-joins do indeed have their place, and can be very powerful and useful, often times there is a much easier and more efficient way to get the results you need when querying a single table.
That's right boys and girls, it's what you've been waiting for all weekend: Another edition of the mailbag!
Did you know that a new feature in SQL Server 2005 allows you to specify an OVER partition for aggregate functions in your SELECT statements?
Here's my response to a SQL Challenge, regarding how to find consecutive free time slots in a schedule. This can sometimes be tricky to solve in SQL, but using either of the two techniques shown here, it is actually pretty easy. (Updated to show 2 possible solutions)
When using an OUTER JOIN, you should put criteria on the outer table in the join condition, not in the WHERE clause. However, I often see a "workaround" to avoid this simple and solid rule, which might seem to work but actually doesn't. Since it is hard to explain why over and over in forum posts, I thought it might be helpful to address that here once and for all with an example.
Here's a simple User-Defined Function implementation of the Hamming Distance Algorithm for SQL Server, useful for tracking down transposition errors in your data.
A Google search for the phrase sql "case statement" returns 127,000 results. Meanwhile, if we do a search for the phrase sql "case expression" we get back only 43,900 results. What does this mean? Most people don't understand what SQL's CASE feature really is, how to use it, or how it works!
Previously, I had written that UNION ALL (combined with a GROUP BY) is a really quick and easy way to compare two tables. Well, now in SQL 2005, we have another option: using EXCEPT and INTERSECT. And these are even easier!
So, I hear you're a "set-based SQL master" ! As Yoda once said, you've "unlearned what you have learned". You've trained yourself to attack your database code not from a procedural, step-by-step angle, but rather from the set-based "do it all at once" approach. However, don't completely forget the most important skill that you learned in the procedural world!
Be sure to visit SQLTeam.com to check out my latest article, Custom Auto-Generated Sequences in SQL Server. It addresses a common question we see in the SQL Team forums.
As many of you know, I strongly recommend that you avoid using RIGHT OUTER JOINS, since they make your SQL code less readable and are easily rewritten as LEFT OUTER JOINs. In addition, I have yet to find a situation where a FULL OUTER JOIN makes sense or is necessary -- I have found that in just about every case other techniques work better.
Everything you ever wanted to know about how to use SQL Server's advanced features to format Dates into any format that you need! Here, for the first time, all in one place, is the secret that no one wants you know about how easy it is to format data in T-SQL!
It's been a while, so let's open up the old mailbag. Today's question is about writing a character-based check constraint. The answer: Simple ... do the *opposite* !
Sometimes, when writing SELECTs, we come across situations in which we we need to write a join containing a condition of some sort. You might think to do this with either a CASE expression or with some OR boolean logic in your join expression. There's a much better way to approach the problem.
Lots of questions come up in the SQL Team forums about conversions between Access and T-SQL and some of the differences between the two SQL dialects. Here's a nice list of handy things to help you out with converting your projects.
I wrote about a few of the new features in SQL 2005 and how they can be used to solve some old "classic" SQL problems very easily, and I thought I'd briefly discuss a few more. Also, if you enjoy baseball, read on!
The RANK and PARTITION features in 2005 are simply amazing. They make so many "classic" SQL problems very easy to solve.
I thought I'd take a few minutes to discuss something we see quite often in the programming world, using a T-SQL example of a stored procedure that accepts a list of optional parameters allowing you to determine some basic filters on the results.
I posted this one a long time ago and needed to use it today, so I thought I'd post it up here as well in case anyone finds it useful. This simply attempts to capitalize the first letter of each word for the string passed in. Use it to help clean up some pre-existing data, but don't use it as a way of presenting your data that is stored improperly since it isn't exact.
I've posted some variations of these before, but here they all are in 1 easy package: The essential date and time functions that every SQL Server database should have to ensure that you can easily manipulate dates and times without the need for any formatting considerations at all.
Let's assume that for some reason you are not using Stored Procedures. While I can respect your choice in that regard, that doesn't mean that you cannot still use parameters when constructing your SQL statements at the client.
A common difficulty beginning SQL programmers encounter is joining two or more transactional tables all in one SELECT statement. Missing data, duplicates, time-out errors, and other unexpected results often arise from trying to directly write JOINS between two transaction tables.
One of the things that troubles me most about SQL Injection is that is seems it is still very misunderstood.
For example, is the following psuedo-code vulnerable to SQL Injection?
There's been quite a few posts out there with SQL implementations of Suduko puzzle solvers.
Most of these are really basic. Some are my take on established ideas and standards, and may be controversial. Agree or disagree? All feedback is welcome! (well ... mostly the "I agree, you're a genius" feedback is welcome ... but I'll accept all of it, I suppose)
Here’s my advice of the day: Try your best to store all of the data relating to your database in the tables.
Sounds pretty obvious, I know, but you might be surprised just how much data in your application is not in your tables, where it belongs, but rather nestled away in your code.
One aspect of the versatile SELECT statement that seems to confuse many people is the GROUP BY clause. It is very important to group your rows in the proper place.
Many SQL books recommend that you “avoid cross joins” or “beware of Cartesian products” when writing your SELECT statements, which occur when you don't express joins between your tables. It’s true that you need to ensure that your join conditions are adequately stated so that you don’t accidentally produce this effect, but it is not true that you should avoid these types of joins in every situation.
As promised in my last post, here is some performance testing to help you determine the performance benefits (if any) of performing your crosstabs at the presentation or code layer, as opposed to forcing SQL Server to do this.
In my last post, I spoke briefly about how I felt that in general crosstabbing data is something that a presentation layer should do and not the database. Consider the result of a crosstab operation -- the columns returned will vary depending on the data. There is pretty much no further manipulation you can do with that result in T-SQL; in the relational database world, the column names of our database objects should be constants and not continually changing as the data changes. Also, in T-SQL there is no easy way to dynamically pivot data, and even doing it with...
First off, before going any further make sure you have read the hall of fame SQLTeam article by Rob Volk on generating crosstab results using a flexible, dynamic stored procedure that has been viewed over 100,000 times!
I talked about separating dates from times for certain columns in your database. One thing I have never really used before in SQL Server 2000 is user-defined datatypes and rules. Rules are like CHECK constraints, but from what I understand they are very non-standard.
I don't like storing dates along with the time portion in my database tables. Sure, I use the datatype, but I wish SQL provided a “Date” datatype and a “Time” datatype that were separate. It really sometimes requires too much work in T-SQL to separate the date portion from the time portion, and often in a WHERE clasue you don't care about the time, you just want data for a single day.
As it is implemented, the RAND() function in SQL Server doesn't let you return a different random number per row in your SELECT statement. For example ...
Just a quick post of some benchmarking code for our good friend Guayo at TheDailyWTF, in regards to this discussion.
When you have two tables (or resultsets from SELECT statements) that you wish to compare, and you want to see any changes in ANY columns, as well as to see which rows exist in 1 table but not the other (in either direction) I have found that the UNION operator works quite well.
Here's one way to implement a search, similar to Google's, if you don't want to (or can't) use full text indexing. This allows you to pass a string of words, and indicate that either ALL words must match, ANY must match, or the exact string must match.
Here's a scenario many of us have encountered: You inherit a database. TableX has an identity column "ID" set up as the primary key, and contains many duplicates.
I see many people have the need to pass in multiple pieces of information to stored procedures in a parameter, often as a CSV string. For example, you might have a stored procedure that will return customer info for all customer ID's passed in a CSV string, called something like this:
I just wanted to take a minute to highlight this thread in the sqlteam forums.
Despite the cool things you can do with writing your own custom aggregate functions as shown in my last article, in many cases you don't need custom functions to get what you need. This should answer some of the questions raised in the comments from my article.
If the topic intrigues you, please check out my latest article:
A little too long for a blog unfortunately. Before reading the article, keep in mind it's purely an academic exercise. But the ideas presented are hopefully somewhat original, and might even have some practical applications, who knows?
Here is a completely dynamic version of my paging technique in the form of a stored procedure. Just pass in a short SQL statement, the Order By clause, and the start row and end row you'd like to return in the resultset.
This may be common knowledge to those among us that are handy with XML (I'm not), but since I just uncovered this nice little trick I thought I'd pass it along either way. It is a very easy way to process a CSV string and turn it into multiple rows:
declare @idoc int -- the handle for the XML documentdeclare @csv varchar(8000) -- the CSV we are being passed
set @csv = '1,2,3,7,4,8,10' -- our list of CSV values
-- With a simple edit of the CSV string:
set @csv = ''
-- We can now do this:
EXEC sp_xml_preparedocument @idoc OUTPUT, @csv
-- And then:
SELECT *FROM ...
Here's the most efficient way that I can think of to return rows @a through @b from a table or query, when each row is ordered and can be uniquely identified by a set of columns, in SQL 2000.
Another way to create dates easily, w/o using CONVERT or using a UDF (i.e., if you have SQL 7.0), if you are starting out with integers representing the year, month and day of the date you need to create:
To me, this is a pretty handy function to have in your toolbox ...
Anyone know what this will return? Unfortunately, the argument must be less than or equal to 32 due to the number of levels of recursion SQL can handle. It's probably not very efficient, either. But, it's pretty cool!
As some of you may know, I recommend to avoid using CASE expressions in the WHERE clause of a query to express conditional logic. I prefer to have everything translated to simple ANDs, ORs and NOTs to keep things a) portable, b) easier to read and c) efficient.
I recently helped someone with performing some simple cross-tabs in SQL ... read more...
I thought I'd post a blog showing a technique I haven't seen elsewhere. This is from a recent post of mine at the SqlTeam forums.
The problem: you have two tables, each containing mutiple rows of data per “SetID”. You need to compare the two tables to see which sets have complete matches.
First, the DDL and the data:
create table #t1
(SetID int not null,
Attribute varchar(10) not null,
Value int not null,
constraint t1_pk primary key (SetID, Attribute))
create table #t2
(SetID int not null,
Attribute varchar(10) not null,
Value int not null,
constraint t2_pk primary key (SetID, Attribute))
insert into #t1
select 1,'a',1 union
select 1,'b',2 union
select 1,'c',3 union
select 2,'a',4 union
Handy reminder: The PARSENAME() function can be useful for parsing small strings. It returns parts 1-4 (working right to left) of a string, with each part delimited by periods.