Broken Views

Mon Dec 26, 2011 by Ajarn Mark Caldwell in best-worst-practices

“SELECT *” isn’t just hazardous to performance, it can actually return blatantly wrong information.

There are a number of blog posts and articles out there that actively discourage the use of the SELECT * FROM …syntax. The two most common explanations that I have seen are:

Performance: The SELECT * syntax will return every column in the table, but frequently you really only need a few of the columns, and so by using SELECT * your are retrieving large volumes of data that you don’t need, but the system has to process, marshal across tiers, and so on. It would be much more efficient to only select the specific columns that you need.
Future-proof: If you are taking other shortcuts in your code, along with using SELECT *, you are setting yourself up for trouble down the road when enhancements are made to the system. For example, if you use SELECT * to return results from a table into a DataTable in .NET, and then reference columns positionally (e.g. myDataRow[5]) you could end up with bad data if someone happens to add a column into position 3 and skewing all the remaining columns’ ordinal position. Or if you use INSERT…SELECT * then you will likely run into errors when a new column is added to the source table in any position.

And if you use SELECT * in the definition of a view, you will run into a variation of the future-proof problem mentioned above. One of the guys on my team, Mike Byther, ran across this in a project we were doing, but fortunately he caught it while we were still in development. I asked him to put together a test to prove that this was related to the use of SELECT * and not some other anomaly. I’ll walk you through the test script so you can see for yourself what happens.

We are going to create a table and two views that are based on that table, one of them uses SELECT * and the other explicitly lists the column names. The script to create these objects is listed below.

IF OBJECT_ID('testtab') IS NOT NULL
   DROP TABLE testtab
go
IF OBJECT_ID('testtab_vw') IS NOT NULL
   DROP VIEW testtab_vw
go   
IF OBJECT_ID('testtab_vw_named') IS NOT NULL
   DROP VIEW testtab_vw_named
go   
CREATE TABLE testtab (col1 NVARCHAR(5) null, col2 NVARCHAR(5) null)
INSERT INTO testtab(col1, col2)
VALUES ('A','B'), ('A','B')
GO
CREATE VIEW testtab_vw AS SELECT * FROM testtab
GO
CREATE VIEW testtab_vw_named AS SELECT col1, col2 FROM testtab
go

Now, to prove that the two views currently return equivalent results, select from them.

SELECT 'star', col1, col2 FROM testtab_vw
SELECT 'named', col1, col2 FROM testtab_vw_named

OK, so far, so good. Now, what happens if someone makes a change to the definition of the underlying table, and that change results in a new column being inserted between the two existing columns? (Side note, I normally prefer to append new columns to the end of the table definition, but some people like to keep their columns alphabetized, and for clarity for later people reviewing the schema, it may make sense to group certain columns together. Whatever the reason, it sometimes happens, and you need to protect yourself and your code from the repercussions.)

DROP TABLE testtab
go
CREATE TABLE testtab (col1 NVARCHAR(5) null, col3 NVARCHAR(5) NULL, col2 NVARCHAR(5) null)
INSERT INTO testtab(col1, col3, col2)
VALUES ('A','C','B'), ('A','C','B')
go
SELECT 'star', col1, col2 FROM testtab_vw
SELECT 'named', col1, col2 FROM testtab_vw_named

I would have expected that the view using SELECT * in its definition would essentially pass-through the column name and still retrieve the correct data, but that is not what happens. When you run our two select statements again, you see that the View that is based on SELECT * actually retrieves the data based on the ordinal position of the columns at the time that the view was created. Sure, one work-around is to recreate the View, but you can’t really count on other developers to know the dependencies you have built-in, and they won’t necessarily recreate the view when they refactor the table.

I am sure that there are reasons and justifications for why Views behave this way, but I find it particularly disturbing that you can have code asking for col2, but actually be receiving data from col3. By the way, for the record, this entire scenario and accompanying test script apply to SQL Server 2008 R2 with Service Pack 1.

So, let the developer beware…know what assumptions are in effect around your code, and keep on discouraging people from using SELECT * syntax in anything but the simplest of ad-hoc queries.

And of course, let’s clean up after ourselves. To eliminate the database objects created during this test, run the following commands.

DROP TABLE testtab
DROP VIEW testtab_vw
DROP VIEW testtab_vw_named

Legacy Comments


Francesco Milano 2012-01-03	re: Broken Views Interesting post! Another good reason to avoid SELECT * :) I think the reason behind that is the creation of metadata into sys.all_columns for all the columns of the view when creating it. Only altering (or re-creating) it causes the refresh of that metadata records. Bye!

Mother's Day Flowers 2012-02-15	re: Broken Views This time can be utilized for other important academic or co-curricular activities. However many students unfortunately are not aware of any second option other than forcing themselves to do this daunting work.

fixed immediate annuity 2012-02-17	re: Broken Views You are retrieving large volumes of data that you don’t need, but the system has to process, marshal across tiers, and so on. It would be much more efficient to only select the specific columns that you need.

how to make a website 2012-03-13	re: Broken Views I find it particularly disturbing that you can have code asking for col2, but actually be receiving data