Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

Taking a look at CROSS APPLY

Applying a Sub-Query, Joining a Derived Table ...

I think the easiest way to think of CROSS APPLY is that it is like doing a CROSS JOIN with a correlated sub-query instead of a derived table.  Let's see if I can explain that .... A derived table is "self-contained", in that all tables and columns in the parent SELECT are not accessible (though variables and parameters can be referenced).  For example, consider:

select A.*, b.X
from A
cross join (select B.X from B where B.Val=A.Val) b

That is not legal because A.Val is out of scope within the derived table; this is because the derived table is evaluated independently of the other tables in the SELECT.  To limit the rows in table B so that B.Val = A.Val, we must do that outside of the derived table via a join or in the criteria:

select A.*, b.X
from A
cross join (select * from B) b
where A.Val = b.Val

(Of course, the above is equivalent to doing an INNER JOIN to the derived table, or just joining to the table B.)

Also, keep in mind that the scope-of-derived-tables rule isn't just for CROSS JOINS, it's for all JOINS -- CROSS, INNER, OUTER and even UNION; they all use "self-contained" derived tables.

This is in contrast to a correlated sub-query, where the parent SELECT is in scope for the sub-query; the sub-query is evaluated for each row in the query, so the other tables and columns in the SELECT are all available:

select A.*, (select B.X from B where B.Val=A.Val) as X
from A

(Note: I am ignoring for now the fact that returning multiple rows in a sub-query will return an error.)

This is an easy way to think of the difference between CROSS JOIN and CROSS APPLY.  CROSS JOIN, as we saw, joins to a derived table; however, CROSS APPLY, despite looking like a JOIN, actually is applying a correlated sub-query. This imposes both the advantages of a correlated sub-query but also the performance implications.

So, we can simply rewrite our first example using CROSS APPLY like this:

select A.*, b.X
from A
cross apply (select B.X from B where B.Val=A.Val) b

Since we are performing an APPLY and not a JOIN, A.Val is in scope and it works just fine.

Table Valued User Defined Functions

Note that the same rules apply when using Table-Valued User-Defined Functions:

select A.*, B.X
from A
cross join dbo.UDF(A.Val) B

is not legal; once again, A.Val is not in scope for the user-defined function.  The best we can do before SQL 2005 was to use a correlated sub-query:

select A.*, (select X from dbo.UDF(A.Val)) X
from A

However, that is not logically equivalent; the UDF cannot return more than 1 row or it will result in an error, and wouldn't make logical sense anyway if it did.

Starting with SQL 2005, we can now use CROSS APPLY and it will work fine:

select A.*, b.X
from A
cross apply dbo.UDF(A.Val) b

So, that is one way to think of the difference between a JOIN and an APPLY; a JOIN combines two separate result sets, but APPLY is more of a loop that evaluates one result set over and over for each row in another.  This means that, in general, APPLY will be less efficient than a JOIN, just as, in general, correlated sub-queries are less efficient than derived tables.  (The optimizer, however, is generally quite good at optimized correlated sub-queries when possible.)

So, why use CROSS APPLY instead of a correlated sub-query?  What's the advantage?  Actually, quite a lot -- it is much more powerful!

CROSS APPLY can return multiple rows

Unlike correlate sub-queries, CROSS APPLY works with multiple rows.  This allows us to do things like "joining" a table to a function that parses a CSV column in that table into multiple rows:

select A.ID, b.Val
from A
cross apply dbo.ParseCSV(A.CSV) b

When the ParseCSV() function returns multiple rows, it simply acts as if we have joined Table A to the function's return table, duplicating the rows in Table A for each row in the joined table.  This is not possible with a correlated sub-query, and will result in an error.  This is a quick and easy way to parse a table of data into multiple rows in an efficient "set-based" manner when the algorithm requires a complex User-Defined Function. (The CSV is not a great example, but other parsing routines are not as easily accomplished via a JOIN to a numbers table.)

CROSS APPLY can return multiple columns

Again, in a correlated sub-query, we can only return a single value.  If we write a SQL statement that returns a running sum, we can use a correlated sub-query like this:

select o.*,
  (select sum(Amount) from Order o
   where p.OrderDate <= o.OrderDate) as RunningSum

from Order o

However, what if we'd like to return an additional running sum of Orders based on some other criteria (e.g., for orders with the same "OrderCode")?  We'd need another correlated sub-query, greatly reducing the efficiency of our SELECT:

select o.*,
  (select sum(Amount) from Order o
   where p.OrderDate <= o.OrderDate) as RunningSum,

  (select sum(Amount) from Order o
   where p.OrderCode = o.OrderCode and p.OrderDate <= o.OrderDate) as SameCode

from Order o

However,  we can easily re-write that using a single CROSS APPLY:

select o.*, rs.RunningSum, rs.SameCode
from Order o
cross apply
 (
   select
     sum(Amount) as RunningSum,
     sum(case when p.OrderCode = o.OrderCode then Amount else 0 end) as SameCode

   from Order P
   where P.OrderDate <= O.OrderDate

 ) rs

So, we get the benefit of return multiple columns like a derived table, and we also get the ability to reference outer values in our SELECT to use in criteria and CASE statements.  Very, very powerful.
 
CROSS APPLY also allows us to quickly get columns from the "previous" row in a table quite easily:

select o.*, prev.*
from Order o
cross apply
 (
   select top 1 *
   from Order P where P.OrderDate < O.OrderDate
   order by OrderDate DESC
 ) prev
 
We can express the running total and counts as CROSS JOINS or INNER JOINS by reworking a correlated sub-query, but using TOP and referencing the outer SELECT is not easily done in that manner. 

Note that the above CROSS APPLY will not return any orders without a previous order; we would use OUTER APPLY to ensure that all orders are returned even if no previous orders exist:

select o.*, prev.*
from Order o
outer apply
 (
   select top 1 *
   from Order P where P.OrderDate < O.OrderDate
   order by OrderDate DESC
 ) prev

We can also CROSS APPLY to a Table-Valued User Defined Function that returns exactly one row, but with multiple columns, to return separate pieces of data from a single function call.  For example, we can parse an email address into separate username and domain columns.  A few months back I wrote a SQL Team article that discusses that concept.

Summary


CROSS and OUTER APPLY are very powerful and can be very useful, but we must be careful to use them only when necessary; I am still testing the possibilities, but in general an APPLY will tend to be not as efficient as a JOIN.  In addition, APPLY is mostly demonstrated by applying table-valued user-defined functions, but it can be used with in-line SELECT statements as well.

see also:

Print | posted on Thursday, October 18, 2007 5:01 PM | Filed Under [ T-SQL Techniques SQL Server 2005 Joins/Relations ]

Feedback

Gravatar

# re: Taking a look at CROSS APPLY

Great article. I've struggled to find many uses for CROSS APPLY as yet but this has certainly helped. Just a minor point though, is there a slight naming error on the line

sum(case when p.OrderCount = o.OrderCode then Amount else 0 end) as SameCode

shouldn't it be

sum(case when p.OrderCode = o.OrderCode then Amount else 0 end) as SameCode

Cheers

Dan
10/22/2007 3:34 AM | Dan Kennedy
Gravatar

# re: Taking a look at CROSS APPLY

Hi Dan -- Fixed. Thank you!!

- Jeff
10/22/2007 8:08 AM | Jeff
Gravatar

# re: Taking a look at CROSS APPLY

I happened to use a CROSS APPLY for the first time a couple of days ago. I wanted to produce a table of messages where the message may be sent to one or more recipients. Using the CROSS APPLY I was able to generate a delimited list of recipients in the subquery and join it to the main table.

Bye bye cursor.

If you are interested I adapted the final query in this example: http://zulfiqar.typepad.com/zulfiqars_web/2005/04/tsql_concatenat.html

Keith.
10/24/2007 9:11 AM | Keith Bloom
Gravatar

# re: Taking a look at CROSS APPLY

Some qaution is required for CROSS APPLY.
Analyzing the query plan reveals that it's behaviour is very much like a correlated subquery, but with the ability to return multiple columns/records.
5/22/2008 9:35 AM | Peso
Gravatar

# re: Taking a look at CROSS APPLY

Hi Peso --

Thanks! That's pretty much the entire point of my article! :)

From the article:

"This is an easy way to think of the difference between CROSS JOIN and CROSS APPLY. CROSS JOIN, as we saw, joins to a derived table; however, CROSS APPLY, despite looking like a JOIN, actually is applying a correlated sub-query. This imposes both the advantages of a correlated sub-query but also the performance implications."
5/22/2008 9:39 AM | Jeff
Gravatar

# re: Taking a look at CROSS APPLY

nice post, thanks
10/21/2008 2:04 PM | sachin
Gravatar

# re: Taking a look at CROSS APPLY

Thanks, really well presented.
11/19/2009 9:11 AM | Karim
Gravatar

# re: Taking a look at CROSS APPLY

The Best article in using SQL queries. Thanks for that.
11/22/2009 2:23 AM | Anil Rajendran
Gravatar

# re: Taking a look at CROSS APPLY

Good one mate!.. Thanks for the article. thoroughly understood
12/10/2009 5:16 AM | Prabhakar J
Gravatar

# re: Taking a look at CROSS APPLY

Hello people,

I tried to use the examples and got in trouble because “cross apply” only works on SQL 90 compatibility mode – not in SQL 2005 with mode 80 enabled (my case). Since I can’t change the database, I create the following work around:
-- Code
USE tempdb – tempdb is a native SQL 2005 DB
SELECT
tb.field1,
tb.field2,
fc.*
FROM #tempTB as tb
CROSS APPLY OficialDB_Compatibility80.dbo.myUDF (tb.field1, tb.field2, tb.field3, @dinamicVAR) AS fc
USE OficialDB_Compatibility80
-- Code

This worked for me. Good lock!

Bye
Josué Monteiro Viana
9/13/2010 3:58 PM | Josue Monteiro Viana
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET