Some Simple SQL Rules to Live By
Most of these are really basic. Some are my take on established ideas and standards, and may be controversial. Agree or disagree? All feedback is welcome! (well ... mostly the "I agree, you're a genius" feedback is welcome ... but I'll accept all of it, I suppose)
Fully qualify your columns
If your SQL statement involves more than 1 table, ALWAYS fully qualify your columns with the table they come from. If a column name exists only in 1 table, sometimes people tend to feel that a table reference is not necessary. After all, the statement compiles, right? But when reviewing the SQL statement later on, it is impossible to know which column a table is coming from unless you have the schema in front of you when you omit the table name. It's simple to do, and even easier if you follow my next piece of advice.
Use table aliases liberally and intelligently
Table aliases are great for making your SQL leaner and more readable. But try to alias "CustomerTransactions" as "CustTrans" or "CT", and not "X" or some other meaningless abbreviation. Aliasing derived tables in a meaningful way, even if it ends up being a little long, is a great idea. For example, joining to a derived table aliased as "TotalSalesPerCustomer" or "SalesTotals" really makes the SQL clear and readable.
Avoid RIGHT OUTER JOINS
I've talked about this many times in the SQLTeam forums. There is not one single reason to ever, ever use a RIGHT OUTER JOIN in a SQL statement. Never, ever, ever. Always select FROM your primary table (or derived table) and LEFT OUTER JOIN to auxillary tables. What's that? You say sometimes you need a RIGHT JOIN here and there to get the results you need? Then you have written your SQL wrong; start over. The most important part of a SELECT is also the simpliest -- the FROM clause. Your starting point needs to be correct before you can starting joining to other tables.
Derived tables are your friend
Use them as often as necessary to keep things organized and short. For example, if you have a long expression in your SELECT that needs to be referenced several times, do the calculation in a derived table and give it a meaningful alias. Then, you can simply refer to the alias in other parts of your SELECT.
Too often, people keep stuffing related tables into 1 big SELECT until they have every possible piece of data they might need, then they start GROUP'ing and DISTINCT'ing almost randomly until the results start "looking OK". Avoid this. Break your problem down into logical parts; usually that means each part ends up being it's own derived table.
Don't forget UNION ALL
UNION ALL is often forgotten about, when most times it is really what should have been used. Remember -- UNION removes all duplicates in the result set, while UNION ALL just returns everything. This means that UNION is less efficient and may even have unintended side effects if you forget about how its behavoir works. Unless you need to remove duplicates, always use UNION ALL. And if you *do* need to remove duplicates with a regular UNION, be sure to add a comment to indicate this.
DISTINCT is *usually* bad
A good rule of thumb -- if you need a distinct list of values in a single column or two, DISTINCT is the way to go. But if your results "don't look right" or you see some duplicate rows and can't figure out why, do NOT just add DISTINCT to your SELECT to "fix it" !! Step back, look at your joins, and re-write your query properly. Even worse, I've seen people simply add DISTINCT to all their SELECT's right from the start, to preemptively "avoid duplicates". Not good. (Side note: If you are a DISTINCT abuser, try adding meaningful primary keys to your tables).
GROUP BY only what you need
See here for more information on this one.
State your intentions clearly with Parenthesis
Never mix AND's and OR's without parenthesis in boolean expressions. I don't care if you know exactly how SQL will evaluate it -- use parenthesis. They key is to be sure that your intended expression is clearly stated.
Indent!
I don't care how you do it. Just be consistent about it.
Learn when to use * and when not to
Mostly old news here. When you're selecting from a table, don't use it -- list the columns out explicitly, for readablity, performance, and to avoid re-compile bugs that can be tough to track down.
When you are doing a COUNT() or an EXISTS(), no problem, go ahead and use *. Also (and this is my opinion here), when you are selecting from a derived table, go ahead and use * if you need all the columns. The columns are already explicitly defined in the derived table, and it keeps things shorter and cleaner.
Remember your Job
The SQL Server's job is to return data. Your client application or Reporting tool's job is to present that data. Always remember this. If you try to format a resultset using T-SQL, your nice DateTime and Money values come to the client as VARCHAR's. Which means that if the client wants to do anything meaningful with these values (sort, compare, do math, format, etc) it must immediately convert those strings back to the original datatype! So many people try to format their results in T-SQL thinking this makes things easier for their clients, when it makes the T-SQL longer, more complicated, and less efficient. Let SQL do what it does best, and let your presentation layer do what it does best. It really kind of almost makes sense when you think about it.
Always Use Parameters!
OK, not everyone likes stored procedures. I can accept that; after all, while I can be pretty strict on some things (i.e., RIGHT JOINS), I definitely recognize that there are multiple ways to do things. But if you are going to write SQL statements directly at the client, there is still no excuse for not using parameters. And likewise with dynamic SQL generated at the server. Always, always use parameters -- never parse and concatenate user input into SQL strings and execute them directly. Not only is it more work, less secure, harder to maintain, and less readable, but you now have to deal with the format of your data at the SQL Server layer. If a parameter is a DateTime parameter and you set it and reference it in a SQL statement, you are done. But if you are taking a string that *looks* like a DateTime, and you are concatenating it into a SQL command, suddenly the format of that string becomes very important. Just don't do it. Use parameters, whether you are using stored procedures or not. ADO, ADO.NET and even DAO all support parameterized commands for your clients, and the sp_ExecuteSQL stored procedure supports parameters for dynamic SQL on the server.
see also:
- Thinking Set-Based .... or not?
- What I learned about SQL from using Access
- How to be an Effective Report Writer
- Sometimes the problem isn't the code. It's the specs.
- Alias Naming Conventions
- Always Use Parameters. Even if you don't use Stored Procedures.
- Programming is a funny thing ....
- SQL Problem Solving: Identify ... and then Simplify!
Legacy Comments
Tara
2006-03-14 |
re: Some Simple SQL Rules to Live By I disagree about not using RIGHT OUTER JOIN. I use it when it makes the complex query more readable. For instance, if it is to a derived table and you've got other joins in the query already. Putting the derived table at the end makes it easier to read. And this preference has nothing to do with getting the needed results. It was to make a stored procedure used for reporting more readable as the code was pretty complex. |
Jeff
2006-03-14 |
re: Some Simple SQL Rules to Live By Thanks, Tara -- but the problem is, if your "complex" query becomes easier to edit by adding a RIGHT join, then that's a great indicator that you should start over and re-write it. I understand that it is easier to just tack on RIGHT joins to existing SELECT's, but the end result is much less readable and intuitive than if you had re-written the entire SELECT properly. If you can give an example of a situation where a RIGHT join is more readable and makes more logical sense than a LEFT join, definitely post it. |
Tara
2006-03-14 |
re: Some Simple SQL Rules to Live By I actually tried to find it while I was posting to show as an example. I am no longer on the project, but I reviewed the stored procedures that I wrote and was unable to locate it. I believe the stored procedure that I was referring to no longer has the RIGHT OUTER JOIN due to an application requirement change. I did a search on syscomments for RIGHT OUTER JOIN and couldn't find it anywhere. If I ever do come across it again, I'll certainly post it as it was a very good example of the one time to use it. I remember even mentioning it in the forums when RIGHT OUTER JOIN was mentioned in the past. I'm not sure that I posted it back then either due to proprietary information. |
SC
2006-03-15 |
re: Some Simple SQL Rules to Live By I agree generally about right joins, but I have used it before thus: After a bulkload of daily figures, say into tblUpload, we had to update, insert, or delete from tblCurrent based on rows in upload. No requirement for historic data ror keeping track of deletions. (simply) update with inner join where matches delete with left join where righttable (tblUpload) is null insert with right join where lefttable (tblCurrent) is null visualing the sequence makes sense using right here. and it was well commented... |
Jacin Steele
2006-03-15 |
re: Some Simple SQL Rules to Live By I agree, you're a genius! It's nice to know that all is right with the world; that knowledge exists and someone is willing to document and share it with the world... now if we could figure out a way to force everyone to agree with the *correct* way of doing things. :-) Thanks for the reminder about derived tables. When in the thick of the battle, I always forget and put in lengthy code to calculate the values I need. I end up with code that I don't like because it's unreadable. I just need to beat it in to my head to stick with derived tables. Thanks for the reminder! - jacin |
Arnold Fribble
2006-03-15 |
re: Some Simple SQL Rules to Live By "Never, ever, ever." Well, hardly ever. There is a situation where it makes an actual rather than aesthetic difference, and that's when you have a HASH join hint. Because join hints constrain the order of joins in the execution plan to the order in the SQL, you're specifying which table populates the hash table and which one probes. |
Jeff
2006-03-15 |
re: Some Simple SQL Rules to Live By Very interesting, as usual, Arnold! Can you think of a good example for this one? Wouldn't you always want the primary table to be hashed and outer tables to be probed? |
Mladen
2006-03-16 |
re: Some Simple SQL Rules to Live By well i can :) arnold helped me with exactly that thing a while ago. i had a small temp table that i had to update a big table with. i did a left join and it was doing a lot of nested loops. did it with right hash join and the performace went from 3 seconds to 30 ms. |
Jeff
2006-03-16 |
re: Some Simple SQL Rules to Live By >>i had a small temp table that i had to update a big table with. i did a left join and it was doing a lot of nested loops. did it with right hash join and the performace went from 3 seconds to 30 ms. Can you provide some DDL and DML to re-create this situation? I would really like to check this out, it sounds very interesting. |
Rob Farley
2006-03-17 |
re: Some Simple SQL Rules to Live By I like your rules Jeff. Even mentioned them on my blog now too. I'm very pleased to say that I do all these things. Here's a thing I quite like these days too: select 'colname1' = whatever1, 'colname2' = whatever2, etc... as opposed to: select whatever1 colname1, whatever2 colname2, etc... Giving my columns names this way (the first way) means that all my column names are lined up nicely, and even all the definitions are lined up nicely, assuming I hit tab before the equals sign. It means that when I come to glancing at the query, I can see exactly what the result columns are - and this really helps in my derived tables too! |
alberto tapia
2006-04-15 |
re: Some Simple SQL Rules to Live By somthing of union efect create table Algo (aa int) insert Algo values ( 1 ) insert Algo values ( 2 ) insert Algo values ( 3 ) insert Algo values ( 3 ) insert Algo values ( 4 ) create view vAlgo as select * from algo create view vAlgo2 as select * from algo union select * from algo then set this select's select count(*) from algo select count(*) from valgo select count(*) from valgo2 the results in MS SQL: SQL Server 6.50 - 6.50.201 results (each) 5, 5, 4 (*) SQL Server 6.50 - 6.50.258 results (each) 5, 5, 4 SQL Server 6.50 - 6.50.416 results (each) 5, 5, 1 SQL Server 7.00 - 7.00.623 results (each) 5, 5, 4 SQL Server 2000 - 8.00.194 results (each) 5, 5, 4 (*) the first time... when second time and next times issue message "DBPROCESS dead or not enabled" |
doofledorfer
2006-05-20 |
re: Some Simple SQL Rules to Live By In my experience the issue about using parameters comes up most frequently with developers who are trying to transfer procecural coding skills to declarative programming. You need to overcome the habit of decomposing the code into simpler units. Relatedly, how about another one I've seen frequently: Aggregate your SQL assertions whenever possible; never decompose it into a series of equivalent queries. |
Stin
2006-06-20 |
re: Some Simple SQL Rules to Live By Splash |
Bjorn
2006-07-06 |
Using EXISTS or IN One important thing to mention about SQL performance is that EXISTS is usually better than IN. Look at my blog for more info (just click my name). |
Chris
2006-07-28 |
Semantic Table Aliases Jeff, I just got done reading your VB posts. I love those sarcastic posts you do. Anyway, my coworker and I disagree on something, and I'd like to consult you to "settle an argument". At work, I'm told to not use semantic table aliases. Just like you said, I'd normally use ct.OrderID for CustomerTransactions.OrderID, but instead I'm told to use a.OrderID. I have to use a, b, c, etc at work because that's just what they do here and I'm not going to argue about it with anybody. But as a personal disagreement with my coworker, he thinks that naming the tables a, b, c in order that they were joined is more clear for debugging than actually naming them something based off of the table name, in other words, semantic. To clarify, he'd do: SELECT b.OrderID, a.CustomerID, c.OrderDetailID FROM Customer a JOIN Order b on a.CustomerID = b.CustomerID JOIN OrderDetail c on b.OrderID = c.OrderID WHERE { some conditionals here } That's a contrived example to demonstrate our use of table aliases, not a real production example. Will you provide some guidance on this? |
JohnS
2006-09-12 |
re: Some Simple SQL Rules to Live By Good rules, if your world and database are perfect, but I've seldom worked in such a world. Large production legacy databases sometimes cannot just be rewritten and fully normalized, they just have to be lived with, and sometimes "bad" SQL is the only way to deal with them. |
Jeff
2006-09-12 |
re: Some Simple SQL Rules to Live By JohnS -- true enough in general ... but I am not sure why a badly designed database would force you break any of the rules I just gave ... Nothing in my list requires a well designed database. And just because a database might be a mess is no reason to suddenly start coding RIGHT JOINS all over the place. QUite the contrary -- if your database design is really bad, all the more reason to at least try to make your SQL code readable and optimal. Never make excuses for writing poor code. |
blindman
2006-09-13 |
re: Some Simple SQL Rules to Live By In response to Chris (Semantic Table Aliases 7/28/2006 1:48 PM), I think I speak for Jeff and for all accomplished DBAs when I till you the your co-workers and boss are absolute friggin' morons. |
Toby N.
2006-09-27 |
re: Some Simple SQL Rules to Live By Your HTML is floating to the left as you page down. Want to fix it? |
Dave
2007-07-20 |
re: Some Simple SQL Rules to Live By Chris said: "I have to use a, b, c, etc at work because that's just what they do here and I'm not going to argue about it with anybody. But as a personal disagreement with my coworker, he thinks that naming the tables a, b, c in order that they were joined is more clear for debugging than actually naming them something based off of the table name, in other words, semantic." Maybe if your coworker put his effort into making the code more _clear_ and _readable_ he wouldn't have the need to make the code easier to debug. |
Beef
2007-07-20 |
re: Some Simple SQL Rules to Live By Good article! I do most of this already, but I'm guilty of not fully qualifying column names. I've never really had issues with not being able to determine/remember what alias references which table. Guess I'm just special. ;) (Special Ed) This world would be a much better place if people followed these rules. Beef |
tiendat
2007-07-21 |
re: Some Simple SQL Rules to Live By tin moi cap nhat |
bjupton
2007-07-22 |
re: Some Simple SQL Rules to Live By This is all good stuff. Even the stuff you know...nice to have it well said and in one spot. I see bad sql like this every day. I'm going to post this in my cube. Cheers |
Tim
2007-07-24 |
re: Some Simple SQL Rules to Live By Hi jeff, Thanks for the tips. It's really useful for newbies like me. I have one question regarding the RIGHT JOINS. Bear with me if this is too basic to be mentionned on this site, but I can't think of another solution to come up with what I have to do. The situation is: I have 2 tables. the first one is a list of sports teams. (team_id, team_name, team_franchise ...) - Primary Key is on team_id. The second table is a list of games (game_id, team_1 , team_2, score_team_1, score_team_2, date) The goal is to be able to display something like this: Team 1 (Franchise1) score1 - score2 Team 2 (Franchise2) i'd like to be able to display all the games for a specific date. How can i achieve this without using RIGHT JOINS. (if this required to change the database design, it's not an issue.) Am I missing something in my "SQL Logic" or is it a database design issue ? or both maybe.... ? Thanks for your help |
Mário Marinato
2007-07-24 |
re: Some Simple SQL Rules to Live By Thanks for the tips. They can act as a "north" for me, showing me what I need to study more. Brazilian cheers. |
rüya tabiri
2008-08-29 |
re: Some Simple SQL Rules to Live By thanks you |
Hey
2008-11-13 |
re: Some Simple SQL Rules to Live By Good Stuff |
Prasobh
2008-12-21 |
re: Some Simple SQL Rules to Live By HI, I am running a simple select statement but that is taking too much time. Some times i am getting timout also.Pls help on this. regards, Prasobh SELECT * from From DiamondDivision.dbo.DDPDROurSuppConDet WITH(NOLOCK) WHERE PDR_Cmp_Cd = 'DMSDX' And PDR_Div_Cd ='DDD' And PDR_Loc_Cd ='WDIAM' And PDR_Doc_A = 'XPDRV' And PDR_Doc_N = 10115 AND PDR_Rec_No =1 And PDR_Cons_Typ = 'S' Order By PDR_Rec_No, PDR_Sr_No |
Jeff
2008-12-23 |
re: Some Simple SQL Rules to Live By Prasobh -- maybe try indexes? And asking your questions at the SQLTeam forums instead of commenting on a random, unrelated blog post? |
Mirko
2009-01-26 |
re: Some Simple SQL Rules to Live By Well, I'd like to add to the "Use * ..." rule: Avoid anything in your code that implicitly makes assumptions on the column order of a table or view, or the parameter order of a proc or function, instead name the columns explicitly in any such cases. In my opinion, this holds true for using AND creating views/procs/functions. Applications will get much more maintainable if you can see what data is processed in you source code, without having to look at a table, view, or proc definition. It also makes it possible to do changes to the database schema more independently of your client application code. |