Chris Miller Blog

RocketScientist's Miscellaneous Ramblings

Thoughts on Query Complexity

I'm working with a vendor's system that has a very severe performance problem.  I'm looking at it because despite the fact that I told the vendor there was a problem with a particular query back in OCTOBER and that the problem was going to degrade linearly with the data size they didn't (1) fix the query or (2) provide us with a data archive system, and now it's an emergency.

Sigh. 

The reason the query is performing badly is that it's a query of a view.  Not a stored procedure containing a query of a view even, but their application directly does a select statement.

Problem 1: All data access is to be done through stored procedures.  All of it.  If you are writing an application and don't follow this rule, please just kill yourself now and save me the trouble of having to track you down.

It's a query of two views, using the old join syntax…

Problem 2: It's an OLD join syntax.  There is a NEW join syntax.  It doesn't suck like the OLD one did.  If you use a comman in your WHERE clause, please just go back to writing user interfaces.

One of the views is a 5 part UNION ALL, which contains in one part another view, which is itself a big pile of UNION ALL's.

Problem 3: Excessive use of UNION ALL is prominent in poorly constructed databases.  UNION ALL is generally called for when you're generalizing data, making one thing look like something it's related to.  If they're close enough to be related together in this way, USUALLY they're the same object and you're thinking about the problem wrong.  "We need to use UNION ALL so we get a transaction total across our 5 order tables".  Why do you have 5 order tables?  Poor design.  One order table.  5 order detail tables that differentiate the transaction types. 

With all these problems combined, what should have taken about 5 minutes to figure out took more like 4 hours to diagnose, fix, verify, and implement.  The problem turned out to be that one of the deepest views was not filtering properly, instead of filtering down at the view level, the SQL Server query optimizer decided to join the 250K rows to a few other tables and then filter it down to 0 rows.

When you write complicated queries, you give the optimizer a LOT of chances to screw up.  And it will take every chance it gets to screw up and pick the least efficient strategy possible far more often than you'd like.

"Oh, but if you had up to date indexes and statistics that wouldn't happen"

Uh huh.  Right.  Pull the other one it's got bells on.  Indexes rebuilt nightly, because I have the time to do it.

The optimizer is usually pretty good.  "Usually" usually doesn't cut it.

Write small queries, store temporary results, combine the temporary results.  "Oh but that's space inefficient".  Uh huh.  Look at your query plan. See that "Spooling" step?  Guess what that does!  It creates a temporary table for intermediate results.  It's the same work either way.  One way you control how the work gets done, and you understand your data and business rules.  The other way SQL Server controls how the work gets done, and it doesn't understand your data or business rules.  One way you can highlight part of your query, run it, and performance tune it one piece at a time.  The other way, you get an atomic operation, maybe not done in the order you'd like, and the order is somewhat arbitrary.  The optimizer does things in a specific order based on data size and index selectivity, but it's not always in the RIGHT order. 







Legacy Comments


Dewayne Christensen
2007-12-28
re: Thoughts on Query Complexity
You completely fail to explain how "problem" 1 is a problem. With LINQ in the wild now, you're going to see far fewer developers using stored procedures. Like anything, that's going to be both bad and good, depending upon the skill of the developer. If sprocs were far and away better than client-issued SQL, then why did Microsoft invest so much effort in LINQ?

If you're in the camp that automatically associates SP = faster, then you're wrong:

http://weblogs.asp.net/aaguiar/archive/2006/06/22/Stored-Procs-vs-Dynamic-SQL.aspx

It's the SQL itself that determines what kind of performance you get, not where the SQL originates from.

As for the people who complain about intermediate temporary tables, teach 'em to use table-valued variables and derived tables (although in this case, it doesn't sound like they were much interested in learning in the first place).


Rocketscientist
2007-12-29
re: Thoughts on Query Complexity
The reason that stored procedures are good is they provide an encapsulation boundary between an application and the data. So a developer can write a stored procedure, and when he has a problem with it send it over to a DBA to take a look at it. Not an ideal situation but a common one. The DBA fixes a stored procedure and it's done. If the query is embedded in the application, then the DBA fixes a query and it's up to the developer to re-parse it and integrate the changes into the code and go back through a build process. So it takes longer to implement. It also requires that the query be run from the application in order to debug performance issues instead of providing an encapsulation of the data objects so they can be tested independently of the application.

From another perspective, a stored procedure also provides an application with a small measure of database independence. It allows the underlying structures of the database to change somewhat without changing the application. This is something I've done quite frequently, needing to re-model, usually for performance reasons (oh did we say we only had 10 salespeople? We meant 100,000...), and being able to make the necessary changes without forcing a rebuild of the application.

I hadn't studied LINQ, but from what I've seen it basically introduces a query language structure into existing languages, and in order to use that against an outside database service the .NET runtime converts the code into SQL, which is then run against the engine. So now, instead of just having a query optimizer that can completely screw up the query you also have a runtime interpreter that can do it. Here's a conversation I know I'm going to have in the future with a developer using LINQ:

"Umm...OK, I got this query in profiler, it's taking too long to execute/doing too many reads/causing too many locks, can you modify it so it does the joins in a different order? And what's up with that really screwed up formatting, it took me an hour to decipher what you're trying to do there..."

"No, that's all handled in the interpreter, I have no control over that."


Stored procedures aren't necessarily faster (precompilation and cached plans neglected, of course, in most cases a stored procedure written in a way that uses precomp and cached plans will be faster than inline code). They are, however, never slower. And for any database with any complexity they're a necessary abstraction layer or the system as a whole is unmanageable.

Also, the query I was looking at ended up being about 10k. That's 10k of SQL that the parser had to parse. It was a commonly run query, so putting that query into a stored procedure would have improved performance because compilation is expensive.