Joe Webb

Musing and observations about SQL Server, other technogies, and sometimes just life in general
posts - 85, comments - 291, trackbacks - 0

My Links

SQLTeam.com Links

News

This blog has moved!

Click here for the new location.

Follow me on Twitter
Add to Technorati Favorites

Search this Blog
 




Archives

Post Categories

About me

Using CROSS JOIN to generate test data

To prepare for an upcoming technical presentation, I need to create some test data to use in a demo. I'm planning to show how different T-SQL programming techniques directly affect SQL Server's efficiency in resolving the query. So, 10 - 20 million rows of data will likely suffice for this demo.

The question is: how to create all of that data? Obviously, a simple INSERT...VALUES statement is good for creating up to a few dozen rows of data, but 10 - 20 million rows of data is out of the question for such a simple technique.

Fortunately, SQL Server has a oft forgotten JOIN type that helps for occasions such as this. We're all pretty familiar with INNER JOIN, LEFT JOIN, and RIGHT JOIN. But are you aware of the CROSS JOIN? A CROSS JOIN takes each row from the first table and combines it with every row from the second table, resulting in a Cartesian product of the two tables. For example, if my first table has 3 rows of data - Mary, Mark, Luke - and my second table has 2 rows of data - Jones, Smith - the Cartesian product will result in 6 rows of data.

First_Name Last_Name
Mary Jones
Mary Smith
Mark Jones
Mark Smith
Luke Jones
Luke Smith

So I have the "how". Now I need the "what".

A quick search on Google led me to the United States Census site where the most popular male, female, and family names are readily available. I downloaded each of those files.There were a total of 5,494 given names and 88,799 family names. I pruned the list of family names down to 2,500 since the original list would have produced a whopping 487,861,706 customers (88,799 * 5,494); far more than I required.

After importing the two lists into two tables, appropriately called Last_Names and First_Names, I used the following script to create my test data.

 

--clean up the imported names 
UPDATE
First_Names
SET
Given_Name = RTRIM(Given_Name)

UPDATE
Last_Names
SET
Family_Name = RTRIM(Family_Name)

 

--create a very simple customers table 
CREATE TABLE Customers
(
Customer_ID INT NOT NULL IDENTITY(1,1)
,Last_Name VARCHAR(20) NOT NULL
,First_Name VARCHAR(20) NOT NULL
,Email_Address VARCHAR(50) NULL
)
 
--create the test data 
INSERT INTO
 Customers
(
  Last_Name
,First_Name
,Email_Address
 )
SELECT
 f.Given_Name
 ,l.Family_Name
 ,f.Given_Name + '.' + l.Family_Name + '@hotmail.com'
FROM
 First_Names AS f CROSS JOIN
 Last_Names AS l

 

My new Customers table now has 13,735,000 rows in it. (A real business should be so fortunate!) I think that'll do nicely for my presentation. If not, I start over and create a 50 millions rows.

Cheers!

Joe

kick it on DotNetKicks.com

Print | posted on Wednesday, February 06, 2008 10:14 AM

Feedback

# re: Using CROSS JOIN to generate test data

Hi there.. I was wondering, if you could give me the link to the site where you download such data. It would be really really useful to me. Thanks =)
2/7/2008 8:38 AM | Ferry

# re: Using CROSS JOIN to generate test data

Sure. I found the list of names here.

http://www.census.gov/genealogy/names/

Joe
2/7/2008 8:58 AM | Joe Webb

# re: Using CROSS JOIN to generate test data

Thanks a lot =)
2/8/2008 10:39 AM | Ferry

# re: Using CROSS JOIN to generate test data

ungalukku vellaiyee illaya da sunny kala
2/15/2009 8:37 PM | PANDARAM

Post Comment

Title  
Name  
Email
Url
Comment   
Please add 2 and 1 and type the answer here:

Powered by: