Techniques
Hey, I can be a cranky guy when it comes to some of the ways I do things. Here are more thoughts on best practices and general ways to approach writing your code.
-- clean up any messes left over from before:
if OBJECT_ID('AllTeams') is not null
drop view AllTeams
go
if OBJECT_ID('Teams') is not null
drop table Teams
go
-- sample table:
create table Teams
(
id int primary key,
City varchar(20),
TeamName varchar(20)
)
go
-- sample data:
insert into Teams (id, City, TeamName )
select 1,'Boston','Red Sox' union all
select 2,'New York','Yankees'
go
create view AllTeams
as
select * from Teams
go
select * from AllTeams
--Results:
--
--id City TeamName
------------- -------------------- --------------------
--1 Boston Red Sox
--2 New York Yankees
-- Now, add a new column to the Teams table:
alter table Teams
add League varchar(10)
go
-- put some data in there:
update Teams
set League='AL'
-- run it again
select * from AllTeams
--Results:
--
--id City TeamName
------------- -------------------- --------------------
--1 Boston Red Sox
--2 New York ...
Nothing earth-shattering here, I was just helping out a colleague with this so I thought I'd post up the example I gave him.
-- sample table:
create table People
(
Person varchar(1) primary key,
City varchar(10),
Age int
)
go
-- with some sample data:
insert into People
select 'A','Boston',23 union all -- odd #
select 'B','Boston',43 union all
select 'C','Boston',29 union all
select 'D','Chicago',15 union all -- single #
select 'E','NY',12 union all -- even #
select 'F','NY',55 union all
select 'G','NY',57 union all
select 'H','NY',61
go
-- here's our query, showing median age per city:
select city,
AVG(age) as MedianAge
from
(
select City, Person, Age,
ROW_NUMBER() over (partition by City order by Age...
Always remember that the column names returned in a result set do not have to be the same as what you eventually output at your presentation layer.
For example, suppose you have stored procedure that accepts a @CurrentYear parameter and returns a sales variance between the current year and the previous year for each customer. I often see programmers struggling with writing dynamic SQL to produce output like this:
CustomerID 2008 Total 2007 Total Variance
---------- ---------- ---------- --------
ABC $100 $50 $50
DEF $200 $250 -$50
That is, the names of the columns vary based on the data; that is not a good way...
A common mistake beginners make when working with SQL is trying to format their output at the database layer, as opposed to simply doing this at the presentation layer (i.e., client application, reporting tool, web page, etc). I've covered that quite a bit in various blog posts, but I've only touched upon another similar issue which I feel is equally as important and also commonly mishandled.
In the SqlTeam forums, I often see code that accepts DateTime input in the form of a string value (say, from a TextBox on a web form) and uploads that value to the database written...
Everyone seems to want to "pivot" or "cross tab" data, but knowing how to do the opposite is equally important. In fact, I would argue that the skill of "unpivoting" data is more useful and more important and more relevant to a SQL programmer, since pivoting results in denormalized data, while unpivoting can transform non-normalized data into a normalized result set. We all know that there's lots of bad databases designs out there, so this can be a handy technique to know.
Of course, even a well designed, fully normalized database can still benefit from "unpivoting" from time to time,...
As with any programming language, it is important in SQL to keep your code short, clear and concise. Here are two quick tips that I find are very helpful in obtaining this goal.
I recently set up a web application using SQL Server Express for a client, and it was the first time I had done any "production" level work using the Express edition. As such, this was the first time I ever needed to configure automated backups for a SQL Express database. However, there is no built-in way to schedule backups without using manual scripts or external applications. I briefly toyed with writing something myself to get the job done, but of course we should not reinvent the wheel so I turned to Google to find out what's out there for options....
In this post, I showed a simple way to create simple but useful Date and Time user-defined data types in SQL Server 2000. Here's how to do it in SQL Server 2005, without the need for CLR types or anything fancy.
First, we must create two user defined data types:
create type Date from dateTime
create type Time from dateTime
So, internally (and externally to our clients), these types are really just DateTime. But, we will apply some rules to these types so that the Date data type will always be constrained to a time exactly at midnight, and the the Time data type...
Applying a Sub-Query, Joining a Derived Table ...
I think the easiest way to think of CROSS APPLY is that it is like doing a CROSS JOIN with a correlated sub-query instead of a derived table. Let's see if I can explain that .... A derived table is "self-contained", in that all tables and columns in the parent SELECT are not accessible (though variables and parameters can be referenced). For example, consider:
select A.*, b.X
from A
cross join (select B.X from B where B.Val=A.Val) b
That is not legal because A.Val is out of scope within the derived table; this is because the derived...
Be careful when mixing OUTER and INNER JOINS together in a single SELECT statement. The end result doesn't always seem to "work", and it can be tricky to understand exactly why and how to fix it without incurring additional unintended side effects. Here's an example that demonstrates some of the issues. read more...
Another common area of difficulty or confusion for SQL beginners is how to efficiently retrieve data just for a single month. What is the best way to declare parameters that will be used to indicate which month you are looking for, and how can we efficiently and easily make use of those parameters to get back the data we need? read more...
Ah … primary keys … such a topic! When discussing what columns to define as a primary key in your data models, two large points always tend to surface:
Surrogate Keys versus Natural Keys
Normalization
These can be very complicated and sometimes polarizing things to debate. As I often try to do, I will attempt to approach this topic from a slightly different perspective.
Let's start things off with what I feel is a good interview question:
How would you define what a primary key of a table is?
a. An auto-generated numeric or GUID column in the...
A quick follow up to the "Nested WHERE-IN" anti-pattern post from yesterday ... If you didn't get a chance, be sure to read the comments from that post as well, there are some great points in there so far. read more...
There's a fascinating technique that I see many beginners use when writing SQL statements, and I call it the "Nested WHERE-IN" anti-pattern. It is, unfortunately, a common SQL technique used to avoid JOINS at all costs. read more...
SQL is a set-based language, and often we wish to pass sets of data as a parameter to a stored procedure. For example, you might wish to pass a set of Customers to an invoice generating stored procedure, or a set of employees for which you’d like to calculate a particular bonus. SQL Server already has everything you need to do this, and you don't need CSV or XML strings. read more...
There’s a handy little rule of thumb I use when developing my database models. It comes up often in situations like this: If we are storing phone numbers in the database, should AreaCode be a “free-form” char(3) column? Or should it be a foreign key reference to an AreaCodes table? read more...
Microsoft Access / JET SQL doesn't support cursors; the only way to process rows one by one is using VBA and opening a Recordset. This means that you're stuck using set-based "queries". So ... is this a good thing or a bad thing? A very, very good thing -- in fact, it's what makes Access a great tool for learning SQL! read more...
Sometimes, it appears that a necessary solution to common SQL problems is to join a table to itself. While self-joins do indeed have their place, and can be very powerful and useful, often times there is a much easier and more efficient way to get the results you need when querying a single table.
read more...
SQL Server User-Defined Functions (UDFs) can return either a single value or virtual tables. However, sometimes we might like for a User-Defined Function to simply return more than 1 piece of information, but an entire table is more than what we need.
There's a very interesting way of handling complicated, multi-column dynamic sorts over at john-sheenan.com.
I am not sure about the efficiency of this approach, but in general dynamic sorting tends not to be very efficient anyway (unless you use dynamically-created SQL statements). Overall, it is a very clever use of RANK() and definitely worth a look.
Thanks for the great idea, John!
That's right boys and girls, it's what you've been waiting for all weekend: Another edition of the mailbag!
You are a very important, talented, enterprise-level programmer! You write and maintain millions of lines of code, compiling your applications takes several hours, and your databases contain hundreds of tables with millions of rows. You clearly do not have time to write silly test applications!
read more...
Here's my response to a SQL Challenge, regarding how to find consecutive free time slots in a schedule. This can sometimes be tricky to solve in SQL, but using either of the two techniques shown here, it is actually pretty easy. (Updated to show 2 possible solutions)
read more...
So, I hear you're a "set-based SQL master" ! As Yoda once said, you've "unlearned what you have learned". You've trained yourself to attack your database code not from a procedural, step-by-step angle, but rather from the set-based "do it all at once" approach. However, don't completely forget the most important skill that you learned in the procedural world!
read more...
As many of you know, I strongly recommend that you avoid using RIGHT OUTER JOINS, since they make your SQL code less readable and are easily rewritten as LEFT OUTER JOINs. In addition, I have yet to find a situation where a FULL OUTER JOIN makes sense or is necessary -- I have found that in just about every case other techniques work better.
read more...
Everything you ever wanted to know about how to use SQL Server's advanced features to format Dates into any format that you need! Here, for the first time, all in one place, is the secret that no one wants you know about how easy it is to format data in T-SQL!
read more...
A non-technical guide to writing reports, with a focus on managing requirements, results, and expectations.
read more...
Dear DBA --
Thanks so much for helping us developers out with the latest changes you've made to the stored procedures in our system. While it may have been nice if we got together first to discuss these changes, I do appreciate that you worked very hard to make things much easier for us.
read more...
I thought I'd take a few minutes to discuss something we see quite often in the programming world, using a T-SQL example of a stored procedure that accepts a list of optional parameters allowing you to determine some basic filters on the results.
read more...
Let's go to the mailbag!
In response to this post, Chris writes:
read more...
Let's assume that for some reason you are not using Stored Procedures. While I can respect your choice in that regard, that doesn't mean that you cannot still use parameters when constructing your SQL statements at the client.
read more...
A common difficulty beginning SQL programmers encounter is joining two or more transactional tables all in one SELECT statement. Missing data, duplicates, time-out errors, and other unexpected results often arise from trying to directly write JOINS between two transaction tables.
read more...
One of the things that troubles me most about SQL Injection is that is seems it is still very misunderstood.
For example, is the following psuedo-code vulnerable to SQL Injection?
read more...
There's been quite a few posts out there with SQL implementations of Suduko puzzle solvers.
read more...
In most industries, beginners work on simple, short projects, using only the most basic concepts that they have learned. The experts in those industries, meanwhile, work on things that are very large, complicated, difficult to manage, and require lots of time, energy and resources.
read more...
Most of these are really basic. Some are my take on established ideas and standards, and may be controversial. Agree or disagree? All feedback is welcome! (well ... mostly the "I agree, you're a genius" feedback is welcome ... but I'll accept all of it, I suppose)
read more...
Here’s my advice of the day: Try your best to store all of the data relating to your database in the tables.
Sounds pretty obvious, I know, but you might be surprised just how much data in your application is not in your tables, where it belongs, but rather nestled away in your code.
read more..
Here's kind of a funny/scary thread over at the DailyWTF:
read more...
Many SQL books recommend that you “avoid cross joins” or “beware of Cartesian products” when writing your SELECT statements, which occur when you don't express joins between your tables. It’s true that you need to ensure that your join conditions are adequately stated so that you don’t accidentally produce this effect, but it is not true that you should avoid these types of joins in every situation.
read more...
Commonly here at the SQLTeam forums, users seeking assistance explain in long detail their situation and mention how they have millions of rows of data to deal with or dozens of tables and how complicated it all is. Invariably, those of us offering help will be confused by the requirements (they often aren't stated very clearly or completely) and we'll ask for a small sample of data to work with.
Just a quick post of some benchmarking code for our good friend Guayo at TheDailyWTF, in regards to this discussion.
When you have two tables (or resultsets from SELECT statements) that you wish to compare, and you want to see any changes in ANY columns, as well as to see which rows exist in 1 table but not the other (in either direction) I have found that the UNION operator works quite well.
read more...
I just wanted to take a minute to highlight this thread in the sqlteam forums.
As some of you may know, I recommend to avoid using CASE expressions in the WHERE clause of a query to express conditional logic. I prefer to have everything translated to simple ANDs, ORs and NOTs to keep things a) portable, b) easier to read and c) efficient.
I recently helped someone with performing some simple cross-tabs in SQL ... read more...