Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

The power of the Cross Join

Many SQL books and tutorials recommend that you “avoid cross joins” or “beware of Cartesian products” when writing your SELECT statements, which occur when you don't express joins between your tables.  It’s true that you need to ensure that your join conditions are adequately stated so that you don’t accidentally produce this effect, but it is not true that you should avoid these types of joins in every situation.

Cross Joins produce results that consist of every combination of rows from two or more tables.  That means if table A has 3 rows and table B has 2 rows, a CROSS JOIN will result in 6 rows.  There is no relationship established between the two tables – you literally just produce every possible combination.

The danger here, of course, is that if you have table A with 10,000 rows and Table B with 30,000 rows, and you accidentally create the product of these two tables, you will end up with a 300,000,000 row result -- probably not a good idea.  (Though it is great for creating test data and the like.)

So, how can this ever be useful?  Actually, if you do lots of report writing in SQL, a CROSS JOIN can be your best friend.

Suppose you need to write a report that returns total sales for each Store and each Product.  You might come up with this:

SELECT Store, Product, SUM(Sales) as TotalSales
FROM Sales
GROUP BY Store, Product

Easy enough – except when the requirement states “show $0 if a store had no sales of a particular product”.  The above query won’t do that – it returns no rows at all if a store had no sales for a particular product. 

The solution?  Well, hopefully in your database you have a table of Stores and a table of Products.  A cross join of the two results will return 1 row per combination of Store and Product:

SELECT S.Store, P.Product
FROM Stores S
CROSS JOIN Products P

That result is the perfect starting point for the results we wish to return -- now we just need to return the sales for each combination.  We already have written that in our first attempt, so now we just need to combine the two:

SELECT S.Store, P.Product, ISNULL(C.TotalSales,0) as TotalSales
FROM Stores S
CROSS JOIN Products P
LEFT OUTER JOIN
    (SELECT Store, Product, SUM(Sales) as TotalSales 
     FROM Sales
     GROUP BY Store, Product) C
ON
  S.Store = C.Store AND
  P.Product = C.Product

The SELECT is derived logically from our requirements.  We start by considering all combinations of stores and products, and from there we show any matching sales data.  Our primary, driving rowset is actually not the transaction table, but rather the cross join of two entity tables!  It might seem very counter intuitive if you haven't approached the problem from this angle before, but it leads to very simple and elegant ways to solve rather complicated problems using SQL.

The solution uses what I call “the report writers magic formula”:

(A x B ) -> (C)

In my made up notation, the above reads “A cross joined with B, left outer joined to C ”.  A and B represent master tables of entities in your database, and C represents a summarized derived table of a transactional table in your database.

Some important things to note:

  • All criteria for the transactions, such as date ranges and/or transaction types, need to be done in the inner transaction summary query. 
  • The summarized transactional sub-query needs to be properly grouped so that it returns 1 row per combination of A and B.  Typically, this means that if the PK of table A is “A_ID” and the PK is table B is “B_ID”, then the derived table C should be grouped by A_ID, B_ID.
  • All criteria that determines which entities to show on your report – i.e., certain regions or only “active” products – should be done on the outer query.

Take the previous SELECT statement, for example: Note that the inner SELECT is grouped by Product and Store, which ensures that we return 1 row per combination of Product/Store -- which perfectly matches what the cross join creates.  If we wanted to show only data for 2005, we would put the filter on the TransactionDate column within the inner SELECT (since that is the part of the statement in which we collect and summarize our transactions), but if we want only ProductID #23, we do that in the outer SELECT (since that is where we determine the population of Stores and Products to return):

SELECT S.Store, P.Product, ISNULL(C.TotalSales,0) as TotalSales
FROM Stores S
CROSS JOIN Products P
LEFT OUTER JOIN
    (SELECT Store, Product, SUM(Sales) as TotalSales 
     FROM Sales
     WHERE TransactionDate between '1/1/2005' and '12/31/2005'
     GROUP BY Store, Product) C
ON
  S.Store = C.Store AND
  P.Product = C.Product
WHERE
  P.Product = 23

The CROSS JOIN technique can apply to many situations – to return total labor cost by office by month, even if month X has no labor cost, you can do a cross join of Offices with a table of all months.   Another classic example is showing all GL transactions for a specific set of companies and accounts, returning all accounts and companies even when they have no activity.

The important thing is to practice with very small sets of sample data until you get a feel for how it works.  Also, you should explicitly state CROSS JOIN in your SELECT so that it is very clear that you intend for this to happen and it is not the result of missing joins. 


see also:

Legacy Comments


Brett
2005-09-12
Very Nice
Knew the author before I looked.

Bookmarked this.

Thanks

Brett


Muhammad zahid
2005-09-26
re: The power of the Cross Join
goods

Richard Nelmes
2005-10-31
re: The power of the Cross Join
Jeff - I have a question about your Cartesian products article 'The power of the Cross Join'. I set up a similar example using data from the Oracle scheme 'OE'. Using this the 'isnull' parameter returned error 'ORA-00904: "ISNULL": invalid identifier'. I substituted the 'nullif' function and this seemed to work successfully. Have I made a correct substitution? I am a little uncertain as the manual seems to suggest that where no sales have been made 'TotalSales' would need to be equal to the value zero in order to return null. I suspect most of the returned values of TotalSales would be null rather than zero, so would not match.

Here is the code I wrote -

select c.customer_id, p.product_id, nullif(S.TotalSales,0) as TotalSales
from customers c
cross join products p
left outer join
(select c.customer_id, oi.product_id, sum(oi.unit_price) TotalSales
from order_items oi
join orders o on o.order_id = oi.order_id
join customers c on o.customer_id = c.customer_id
group by c.customer_id, oi.product_id) S
on c.customer_id = S.customer_id and
p.product_id = S.product_id
where
p.product_id between '2800' and '3999' and
c.customer_id between '167' and '169'
order by c.customer_id, p.product_id

This produces 354 rows. If the product and customer subset selections are removed it returns 91,872 rows.

Richard Nelmes
Cardiff, UK
richard.nelmes@bt.com
Work: +44 117 302 4329
Cullular: +44 77 3991 6701

Jeff
2005-11-07
re: The power of the Cross Join
Hi Rhichard --

Honestly, I have no clue regarding Oracle. Sorry. I am not even sure it supports outer joins.

However, when you say " when the subsection is removed it produces xxx rows" what is the SELECT that you are trying? Did you remove the WHERE clause?

MZ
2005-12-29
re: The power of the Cross Join
Hi Jeff,

I read your article. I am having a problem of Self Join (Cartesian Join). Joining the table itself. I have a table of Players. Who played different games with different palyers. I need to calculate their ratings. Calculation should be with 1 player to all other player in the same game. I did it with Self join. It works fine. But as the number of players grows - it is become slow. I am talking about more than 200 thousand records in a table. Do you have any solution to implement Cartesian Join - which will be much faster. Any solution will be much appreciated..

Thanks.
MZ

Jeff
2005-12-30
re: The power of the Cross Join
MZ -- post your question at the SQLTeam forums and we will be happy to help. Please provide your table structure, some sample data (just enough that covers all possibilities that need to be handled) and what your desired results are based on that sample data. The more specific info you can give us, the less guessing we have to do and the better our answer for you will be.

- Jeff

Phylyp
2006-02-01
re: The power of the Cross Join
Jeff,
I fully agree with your statement on the usefulness of cross joins.

I have several times encountered the mindset that cross (or cartesian) joins are "a bad thing". I guess these people are the same as those who vehemently object to any use of goto in C/C++.

I've been able to use it successfully in quite a few cases, and the solution I provided was more elegant (once people got over gulping at the x join) than other solutions considered.

I usually stuff a quick comment right next to the join stating that the cross join IS intended (since we were on Oracle 8 which didn't support the ANSI-92 SQL join syntax).

Regards,
Arun Philip (arunDOTphilipATaccentureDOTcom)

Phylyp
2006-02-01
re: The power of the Cross Join
Jeff,
Addendum to my previous: I believe a good knowledge of relational algebra and set theory can seriously enhance one's ability to use cross joins to their fullest.

Regards,
Arun Philip (arunDOTphilipATaccentureDOTcom)

Eric
2006-02-21
re: The power of the Cross Join
"ut if we want only ProductID #23, we do that in the outer SELECT"

Can you expand on this? Wouldn't it be more efficient to eliminate any products or stores you *don't* want from the inner query, rather than producing a ton of cross-joined results that the outer select will just end up discarding?

Or is there something specific about the way the filtering is performed that would kill performance in this case?

Eric
2006-02-21
re: The power of the Cross Join
Ah, never mind -- I see. You don't have the Products table available in the inner select to do the filtering. Never mind. :)

eeshusiri
2006-03-08
re: The power of the Cross Join
Hi Jeff,

I have read your article regarding the cross join . I find it
some what useful in knowing about the Cross JOin . BUt could not stil make it out.

Could You help with one more eg?

bye,
eeshu

doofledorfer
2006-05-20
re: The power of the Cross Join
An even more frequent reporting requirement is to generated e.g. months for several years. You can start out with a cross join on a table of 12 months with the years you want to report, with a quick clipping filter for the end points if you need to.

ebeck
2006-06-09
re: The power of the Cross Join
Thanks - I've struggled with this many times for reporting - some times to the point of maintaining zero value records to get them in the result set.

Laxman SIngh
2007-04-26
re: The power of the Cross Join
Thanks a lot jeff. It really opened my eyes as to how query is processed by SQL.

Jack V
2007-06-21
re: The power of the Cross Join
As with all things, must be used in the right circumstances









Jack
http://www.itjobfeed.com

Asd
2007-08-08
re: The power of the Cross Join
I would think the correct replacement for isnull is nvl in Oracle. And of course Oracle supports outer joins (with the proper syntax and everything since 9i). No need for FUD :)

Sunil Maheshwari
2008-12-09
re: The power of the Cross Join
Such a wonderful article.

Thanks a lot .

Can i have ur emailid?

raja
2009-03-10
re: The power of the Cross Join
Jeff

er.. I don't want to be one of those guys... but it would seriously scare me to do cross joins between two entity tables as you are doing in the example with the stores and products tables. This is because you are assuming that one instance of the first entity will always be unconditionally linked to every instance of the second entity. In your example, you are assuming that every store will sell every product. This may indeed be the case today but what if in the future things change a little bit. The design would suddenly fail in the future. (or would produce too many 0 rows in your case) I do think that the cross join technique can be a boon in some reporting situations where the number of rows in one of the tables is well known. For instance, if I alter your example and use a months_of_the_year table instead of the products table and you want to track the store sales for each month (even if the store does not have sales for a particular month) then we can use your technique without any big fear.

There is however a way to mitigate the risk even for the products and stores example. Make the cross join a view and use the view in the left outer join to obtain the sales. If tomorrow, you bring in a store that does not sell all products then you can populate a relation table that has the mappings between the store and the products. Your view will now query the new table instead of doing the cross join. Just my 5 cents..

sujit
2009-04-23
re: The power of the Cross Join
hi all

how to use cross join with outer join.

please help me.

regards,

sujit

Jeff S
2009-04-29
re: The power of the Cross Join
sujit -- this might be a crazy thought, but consider actually reading the article you are commenting on?

greenmodulus
2009-07-21
re: The power of the Cross Join
It would help if you would define the schema for your two tables.

NARAYAN
2009-09-23
re: The power of the Cross Join
grids are not used result not desplayed defficult to understand

RGB
2010-05-18
Re: The power of the Cross Join
I'm new to SQL programming and have read a few of your articles. They're absolutely great. I know your knowledge stems from many years of practical experience, but how is one able to understand the underlying mechanics of SQL? How are you able to determine whether a piece of code is efficient (apart from performance tools) and the correct construct to use? I'm only able to wish I may gain at least the same level of expertise as you guys and hopefully surpass it. My hat off to you guys. Thanks.

Spot
2010-06-09
re: The power of the Cross Join
Thanks for your example. I couldn't figure out why in the woruld anyone would ever use a cross join.

R.Subbarama
2010-08-19
re: The power of the Cross Join
This is a quiet and very useful information but can be used for sparingly in our queries