Joe Webb Blog

Musing and observations about SQL Server, other technogies, and sometimes just life in general

Using CROSS JOIN to generate test data

To prepare for an upcoming technical presentation, I need to create some test data to use in a demo. I'm planning to show how different T-SQL programming techniques directly affect SQL Server's efficiency in resolving the query. So, 10 - 20 million rows of data will likely suffice for this demo.

The question is: how to create all of that data? Obviously, a simple INSERT...VALUES statement is good for creating up to a few dozen rows of data, but 10 - 20 million rows of data is out of the question for such a simple technique.

Fortunately, SQL Server has a oft forgotten JOIN type that helps for occasions such as this. We're all pretty familiar with INNER JOIN, LEFT JOIN, and RIGHT JOIN. But are you aware of the CROSS JOIN? A CROSS JOIN takes each row from the first table and combines it with every row from the second table, resulting in a Cartesian product of the two tables. For example, if my first table has 3 rows of data - Mary, Mark, Luke - and my second table has 2 rows of data - Jones, Smith - the Cartesian product will result in 6 rows of data.

First_Name Last_Name
Mary Jones
Mary Smith
Mark Jones
Mark Smith
Luke Jones
Luke Smith

So I have the "how". Now I need the "what".

A quick search on Google led me to the United States Census site where the most popular male, female, and family names are readily available. I downloaded each of those files.There were a total of 5,494 given names and 88,799 family names. I pruned the list of family names down to 2,500 since the original list would have produced a whopping 487,861,706 customers (88,799 * 5,494); far more than I required.

After importing the two lists into two tables, appropriately called Last_Names and First_Names, I used the following script to create my test data.

 

--clean up the imported names 
UPDATE
First_Names
SET
Given_Name = RTRIM(Given_Name)

UPDATE
Last_Names
SET
Family_Name = RTRIM(Family_Name)

 

--create a very simple customers table 
CREATE TABLE Customers
(
Customer_ID INT NOT NULL IDENTITY(1,1)
,Last_Name VARCHAR(20) NOT NULL
,First_Name VARCHAR(20) NOT NULL
,Email_Address VARCHAR(50) NULL
)
 
--create the test data 
INSERT INTO
 Customers
(
  Last_Name
,First_Name
,Email_Address
 )
SELECT
 f.Given_Name
 ,l.Family_Name
 ,f.Given_Name + '.' + l.Family_Name + '@hotmail.com'
FROM
 First_Names AS f CROSS JOIN
 Last_Names AS l

 

My new Customers table now has 13,735,000 rows in it. (A real business should be so fortunate!) I think that'll do nicely for my presentation. If not, I start over and create a 50 millions rows.

Cheers!

Joe

kick it on DotNetKicks.com

Legacy Comments


Ferry
2008-02-07
re: Using CROSS JOIN to generate test data
Hi there.. I was wondering, if you could give me the link to the site where you download such data. It would be really really useful to me. Thanks =)

Joe Webb
2008-02-07
re: Using CROSS JOIN to generate test data
Sure. I found the list of names here.

http://www.census.gov/genealogy/names/

Joe

Ferry
2008-02-08
re: Using CROSS JOIN to generate test data
Thanks a lot =)

PANDARAM
2009-02-15
re: Using CROSS JOIN to generate test data
ungalukku vellaiyee illaya da sunny kala

T. Scott Clendaniel
2009-07-30
re: Using CROSS JOIN to generate test data
You have NO IDEA what this tip means to me! Almost 10 years ago a small group of us not-so-techies went on a mission to try to identify how to generate all possible combinations of records to test some segmentation logic that a vendor had written. None of us came up with this answer, or any very good answer for that matter. Lo and behold the issue just came up again, and your simple yet elegant solution fits the bill.

Thank you very much! :-)

Joe Webb
2009-07-30
re: Using CROSS JOIN to generate test data
I'm glad you found this post useful, Scott! And that's for letting me know.

By the way, I'm now blogging over on my own site - http://www.webbtechsolutions.com/blog

Joe

Ryan Maizel
2009-08-04
re: Using CROSS JOIN to generate test data
Just as an FYI, the mathematical name for what you've produced is called a "Cartesian Product". "Cross Join" is a simpler term that I only heard recently, but the truth is it's not really a join - more like the absence of one. Can't tell you how many times I've used this same trick - very useful when needed, and REALLY a pain when you create one accidentally!

Movers and Moving
2009-09-17
re: Using CROSS JOIN to generate test data
Now that you moved your blog, I will be sure to check it out. And congratulations on the move!

snow boots for women
2010-10-15
re: Using CROSS JOIN to generate test data
After importing the two lists into two tables, appropriately called Last_Names and First_Names, I used the following script to create my test data.

columbia jackets
2010-10-16
re: Using CROSS JOIN to generate test data
The question is: how to create all of that data? Obviously, a simple INSERT...VALUES statement is good for creating up to a few dozen rows of data, but 10 - 20 million rows of data is out of the question for such a simple technique.

fur boots
2010-10-19
re: Using CROSS JOIN to generate test data
I downloaded each of those files.There were a total of 5,494 given names and 88,799 family names. I pruned the list of family names down to 2,500 since the original list would have produced a whopping 487,861,706 customers (88,799 * 5,494); far more than I required.

furry boots | womens snow boots | columbia outlet | columbia sportswear jackets | the north face jackets | north face jacket | cheap mac makeup | discount makeup

coco chanel clothes
2010-10-21
love
knock off chanel jewelry bank to withdraw $15,000. Hawke-Petit chanel bracelets was seen on the bank's mont blanc watch

hanly
2010-10-26
re: Using CROSS JOIN to generate test data
This super video converter for mac is developed by Emicsoft Studio, it is currently the best video converter running under Mac os x, comparied by isqunite, Visualhub and other Video Converter for Mac Free video converter for Mac,

Piyush Bajaj
2011-08-11
re: Using CROSS JOIN to generate test data
This post is really nice.
Well i have also posted a similar article on CROSS-Join and SELF-Join.
Well you can see it from:
www.sqlservergeeks.com/...

wing Shoes
2011-08-18
wing Shoes
Adidas Wing Shoes Genuine, who done fifteenth inside La Liga previous period, appointed Frenchman Philippe Montanier because discipline earlier that summertime, yet Vela is a first brand-new participant to be able to become a member of in front of the brand-new advertising campaign.jeremy scott wings shoes outlet
.Jeremy Scott Wings

jiechuang03
2011-09-22
christian louboutin ankle boots


I found the perfect place for my needs. Contains wonderful and useful messages. I have read most of them and has a lot of them. To me, he's doing the great work.


cool post! very helpful... thanks for the great worrk.

Absolutely amazing.i agree - great article!

Thanks for all the information. I have posted a link to this page on my blog for all of my readers. I have also bookmarked this for my future reference.


Thank you very much. I am wonderring if i can share your article in the bookmarks of society,Then more friends can talk about this problem.

Thanks for an thought, you sparked at believed from a angle I hadn?t provided thoguht to but. Now lets see if I can do some thing with it.

Your post is reall great and I feel excited after reading it. That would be nice to hear from you soon again.