Secrets of Foreign Key Index Binding

You might be surprised to learn that foreign keys bind to physical indexes when they are created. Furthermore, a foreign key does not necessarily bind to the primary key index of the referenced table; SQL Server allows a foreign key to refer to any column(s) that are guaranteed to be unique as enforced by a primary key constraint, unique constraint or unique index.

In this post, I’ll discuss the undocumented rules SQL Server uses to bind foreign key constraints to referenced table indexes so that you can achieve performance goals and protect yourself against unexpected errors in DDL modification scripts.

Background

Typically, one references the primary key in foreign key relationships. I’ve seen a foreign key (deliberately) reference columns other than the primary key only a couple of times in my career. The foreign key referenced an alternate key with a unique constraint in those cases. Why one would create such a relationship is an exercise for the reader. I’ll focus on the primary key here, although the same considerations apply to foreign keys referencing alternate keys.

As I mentioned earlier, SQL Server binds a foreign key to a physical unique index. This binding performance implications because it determines the index SQL Server uses to enforce referential integrity as child table rows are inserted or updated. Also, SQL Server will not allow the index bound to a foreign key to be dropped since that could allow duplicate rows in the parent table and thus break the unique side of the relationship. This must be considered when developing scripts that drop unique indexes (including primary key and unique constraints) that may be bound to foreign keys.

A foreign key referencing the primary key will always be bound to the primary key index when that is the only unique index on the foreign key column(s). However, you might have additional unique indexes on the primary key column(s) for performance reasons. For example, consider the case of a clustered primary key. Performance of a frequently executed query may be improved with a covering non-clustered index:





--create parent table

CREATE TABLE dbo.ParentTable(

       ParentTableID int NOT NULL IDENTITY

              CONSTRAINT PK_ParentTable PRIMARY KEY CLUSTERED

       ,Column1 int NOT NULL

       ,Column2 varchar(100) NOT NULL

       );

GO

 

--create a non-clustered covering index

CREATE UNIQUE NONCLUSTERED INDEX idx_ParentTable_ParentTableID

       ON dbo.ParentTable(ParentTableID) INCLUDE(Column1);

GO

 

INSERT INTO dbo.ParentTable VALUES(1, 'some data');

INSERT INTO dbo.ParentTable VALUES(2, 'some data');

INSERT INTO dbo.ParentTable VALUES(3, 'some data');

GO

 

--create child table

CREATE TABLE dbo.ChildTable(

       ChildTableID int NOT NULL IDENTITY

            CONSTRAINT PK_ChildTable PRIMARY KEY CLUSTERED

       ,ParentTableID int NOT NULL

              CONSTRAINT FK_ChildTable_ParentTable

                     FOREIGN KEY REFERENCES dbo.ParentTable(ParentTableID)

       );

GO

 

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(1);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(2);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

INSERT INTO dbo.ChildTable VALUES(3);

GO

 

UPDATE STATISTICS dbo.ParentTable;

UPDATE STATISTICS dbo.ChildTable;

GO

 

--show the foreign key index binding

SELECT

    fki.name

FROM sys.foreign_keys AS f

JOIN sys.indexes AS fki ON

      fki.object_id = f.referenced_object_id     

      AND fki.index_id = f.key_index_id

WHERE

      f.object_id = OBJECT_ID(N'dbo.FK_ChildTable_ParentTable');

GO

 

--this query uses the covering index instead of clustered PK index

SELECT p.ParentTableID, p.Column1

FROM dbo.ParentTable AS p

WHERE p.ParentTableID IN(1,2,3);

GO

 

 

The SELECT query in the above script uses the covering idx_ParentTable_ParentTableID index. While this is good for performance, it introduces ambiguity regarding index binding to the foreign key. Again, any primary key constraint, unique constraint or index on the referenced column(s) may be referenced by a foreign key. With two candidate unique indexes (PK_ParentTable and idx_ParentTable_ParentTableID), you have little control which index is bound to the foreign key. 

SQL Server chooses the index binding based on rules that vary by version so you will get different binding depending on your version of SQLServer. SQL Server 2005 chooses the clustered index when possible and, if no suitable clustered index exists, the first (lowest index_id) unique non-clustered index on the referenced column(s) is used. The sample script above binds the foreign key to the PK_WideTable index under SQL Server 2005 because it is the clustered index, not because it is the primary key.

In later versions (SQL 2008, SQL 2008R2 and SQL 2012), the foreign key is bound to the unique non-clustered index on the referenced column(s) with the lowest index_id when possible. Only when no suitable unique non-clustered index exists is the unique clustered index chosen. So the foreign key in the above script is bound to idx_ParentTable_ParentTableID in SQL 2008 and later versions instead of the primary key index as one might expect.

Why Foreign Key Index Binding is Important

There are two reasons why it is important to control the index bound to a foreign key. One is performance. As I mentioned earlier, the index bound to the foreign key constraint is used at execution time to enforce the constraint as child table rows are inserted or the foreign key column(s) updated. If the parent table is large and not queried often but rows are inserted into the child table heavily, a unique non-clustered index that “covers” the referential integrity check may be more desirable than the clustered index. This can improve buffer efficiency and page life expectancy compared to using a clustered index (e.g. primary key). My assumption is that this is why SQL Server 2008 and later versions prefer the unique non-clustered index over the clustered index for constraint enforcement.

Another reason one should control the index bound to the foreign key is to facilitate index changes. If you try to drop an index bound to a foreign key, you’ll get an error like “An explicit DROP INDEX is not allowed on index 'dbo.ParentTable.idx_ParentTable_ParentTableID '. It is being used for FOREIGN KEY constraint enforcement.” You’ll need to drop the foreign key first and recreate after dropping the index.

Since one can’t specify the bound foreign key index declaratively, the only guaranteed way to control the binding is to create the foreign key when only the desired unique index exists and create additional indexes afterward. This isn’t to say you can’t rely on the rules described earlier but you need to be aware that such rules vary depending on the SQL Server version and could change in the future. 

 

RIP OLE DB

I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

What This Means to You

Avoid using OLE DB for new SQL Server application development. Update your technology roadmap to move towards migrating existing SQL Server applications that use the SQLNCLI, SQLNCLI10, SQLNCLI11 or SQLOLEDB OLE DB providers to the SQL Server Native Client ODBC driver.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·         Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·         Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download. 

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model.  This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code. 

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

Old OLE DB connection string: "Provider=SQLNCLI10.1;Data Source=MyServer;Integrated Security=SSPI"

New ODBC connection string: "Provider=MSDASQL;Driver={SQL Server Native Client 10.0};Server=MyServer;Trusted_Connection=Yes"

 

The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go (http://blogs.msdn.com/b/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx). APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing. 

So we now have the short list of SQL Server relational data access APIs going forward:

·         SqlClient (managed code)

·         JDBC (Java)

·         ODBC (for native code)

Summary

I’m a big fan of open, cross-platform standards so I’m glad that Microsoft chose ODBC over OLE DB for relational database access. ODBC is an implementation of the SQL call-level interface standard (ISO/IEC 9075-3). In contrast, the COM-based OLE DB SQL Server provider relies on proprietary Microsoft Windows COM technology. The SNAC ODBC driver is a truly native driver and provides the fastest SQL Server database access from native code.

 

Denali CTP3: THROW Statement

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.
T-SQL Error Handling Ugliness

Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:





--Unstructured error handling example

BEGIN TRAN

SELECT 1/0 AS CauseAnError --report error caller

IF @@ERROR<> 0 GOTO ErrorHandler --detect error

COMMIT

GOTO Done


ErrorHandler:

IF @@TRANCOUNT> 0 ROLLBACK--cleanup after error

RETURN --stop further code execution

Done:

PRINT 'Done'--not executed after error

GO


This script results in the error:





Msg 8134, Level 16, State 1, Line 3

Divide by zero error encountered.


Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.

On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.

Two Steps Forward, One Step Back

The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.

Consider the equivalent script with TRY/CATCH:





--Structured error handling example

DECLARE

@ErrorNumber int

,@ErrorMessage nvarchar(2048)

,@ErrorSeverity int

,@ErrorState int

,@ErrorLine int;

BEGIN TRY--detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

SELECT

@ErrorNumber =ERROR_NUMBER()

,@ErrorMessage =ERROR_MESSAGE()

,@ErrorSeverity = ERROR_SEVERITY()

,@ErrorState =ERROR_STATE()

,@ErrorLine =ERROR_LINE();

IF @@TRANCOUNT> 0 ROLLBACK; --cleanup after error

RAISERROR('Error %d caught at line %d: %s'--report error to caller

,@ErrorSeverity

,@ErrorState

,@ErrorNumber

,@ErrorLine

,@ErrorMessage);

RETURN;--stop further code execution

END CATCH

PRINT 'Done'; --not executed after error

GO






Msg 50000, Level 16, State 1, Line 21

Error 8134 caught at line 10: Divide by zero error encountered


I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client. When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.

Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.

THROW to the Rescue

Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.

The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:





--Structured error handling example in Denali CTP3

BEGIN TRY--detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

IF @@TRANCOUNT> 0 ROLLBACK; --cleanup after error

THROW; --report error to caller and stop further code execution

END CATCH

PRINT 'Done'; --not executed after error

GO






Msg 8134, Level 16, State 1, Line 4

Divide by zero error encountered.


There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.

Summary

THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.

Internal SQL Server Database Version Numbers

A database created by a more recent version of SQL Server cannot be attached or restored to an earlier version. This restriction is simply because an older version cannot know about file format changes that were introduced in the newer release. 
If you attempt to attach a database to an earlier version, you will get SQL Server error 948 with the internal version numbers listed in the error message text. For example, the following error occurs if you try to attach a SQL Server 2008 R2 database to a SQL Server 2008 server:

The database 'MyDatabase' cannot be opened because it is version 665. This server supports version 661 and earlier. A downgrade path is not supported.

Sample text from SQL Server error 948
The cryptic version numbers in the error message refer to the internal database version. These internal version numbers are undocumented but are (at least currently) the same value reported by the DATABASEPROPERTYEX function 'Version' property of the source database. If you are unsure of the source database version, the table below maps the internal version numbers to SQL Server versions so you can determine the minimum version you need for the attach to succeed:

SQL Server Version

Internal Database Version

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 SP2+ with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server versions and internal database versions
Below are the allowable SQL Server upgrade paths for a database attach or restore. The internal database version will be as above after a successful attach or restore.

Target SQL Server Version

Source SQL Server Version

Internal Database Version

SQL Server 2008 R2

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2008

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2005 SP2+

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2005

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2000

SQL Server 2000

539

SQL Server 7

515


SQL Server 7

SQL Server 7

515

Database File Versions and Upgrade Paths
As I mentioned earlier, downgrades are not supported. You’ll need to copy objects and data from the newer source database to the older target if you need to downgrade; attach or restore is not an option to copy a database to an earlier version.

SQL Server Connection Strings

This is the first of a series of posts on SQL Server connection strings. I don’t think connection strings are all that complicated but I often see developers have problems because they simply cloned an existing connection string (or found one on the internet) and tweaked it for the task at hand without really understanding what the keywords and values mean. This often results in run-time errors that can be tricky to diagnose. 
In this post, I’ll provide a connection string overview and discuss SqlClient connection strings and examples. I’ll discuss OLE DB and ODBC (used via ADO or ADO.NET) and JDBC in more detail the future articles.
Overview
SQL Server can be accessed using several technologies, each of which has different connection string particulars. Connection strings are provider/driver specific so one first needs to decide on a client API before formulating the proper string can be created. 
All connection strings share the same basic format, name/value pairs separated by semicolons, but the actual connection string keywords may vary by provider. Which keywords are required or optional also vary by provider and providers often share the same keywords (or provide synonyms) to minimize the connection string changes when switching between different providers. Most connection string keywords are optional and need to be specified only when the default is not appropriate. Connection string values should be enclosed in single or double quotes when the value may include a semicolon or equal sign (e.g. Password="a&==b=;1@23")
The purpose of a connection string is to supply a SQL Server provider/driver with the information needed to establish a connection to a SQL Server instance and may also be used to specify other configuration values, such as whether connection pooling is used. At the end of the day, the provider/driver needs to know at least:
·         SQL Server name (or address)
·         Authentication method (Windows or SQL Server)
·         Login credentials (login and password for SQL Server authentication)
SqlClient
One typically uses the .Net Framework Provider for SQL Server (abbreviated to SqlClient here) in managed code and a SQL Server OLE DB provider or ODBC driver from unmanaged code. It is possible to use OLE DB or ODBC for SQL Server data access in managed code but there is seldom a reason to do so since SqlClient offers high-performance access to SQL Server natively.
The authoritative reference for SqlClient connection strings is http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx. My goal is not to rehash all of the keywords or illustrate the many combinations here but rather show the ones most commonly used along with best practices. I use the primary keywords rather than synonyms or equivalent keywords in the examples.
The SqlConnectionStringBuilder class provides a programmatic way to build connection strings needed by SqlConnection class. The nice thing about SqlConnectionStringBuilder is that it provides IntelliSense and avoids connection string typos. It should always be used when constructing connection strings based in user input (e.g. user id and password prompt). But you still need to know which connection string properties (keywords) you need to set along with the default values. The examples here apply regardless of whether or not you use yjr SqlConnectionStringBuilder class.
SqlClient Connection String Keyword Examples
Unlike other providers, there is no “Provider” or “Driver” connection string keyword in a SqlClient connection string.  The .Net Framework Provider for SQL Server is implicit with a SqlConnection class so it is redundant to also specify the provider.
I’ll start with the minimal keyword(s) needed. The minimal SqlClient connection string need only specify the authentication method.  The example below specifies Windows authentication using “Integrated Security=SSPI”. This connection string will connect the default instance on the local machine using Windows authentication under the current process Windows security credentials. 




Integrated Security=SSPI
Listing 1: Connect to local default instance using Windows authentication
To connect to the local default instance using SQL authentication, just specify the credentials using the “User ID” and “Password” keywords instead of “Integrated Security=SSPI” keyword. SQL authentication is the default when “Integrated Security” or “Trused_Connection” keyword is not specified. Although I commonly see "Persist Security Info=False" also specified (a best practice from a security perspective), that is the default setting and may be omitted. Be aware that you should encrypt connection strings (or passwords in general) stored in configuration files when using SQL authentication.




User ID=MyLogin;Password=MiP@ssw0rd
Listing 2: Connect to local default instance using SQL authentication
One often connects to a remote SQL Server. Along with the authentication method, add the Data Source keyword to specify the desired SQL Server name or network address.




Data Source=SQLSERVERNAME;Integrated Security=SSPI
Listing 3: Connect to default instance on host SQLSERVERNAME using Windows authentication




Data Source=SQLSERVERNAME;User ID=MyLogin;Password=MiP@ssw0rd
Listing 4: Connect to instance on host SQLSERVERNAME using SQL authentication
Note that these same connection strings may be used to connect locally or remotely. Personally, I recommend always specifying the Data Source even when connecting locally. This makes it easy to move the application to another machine using with the same configuration and helps avoid oversights.
It is usually best to let SqlClient determine the appropriate network library to use rather than an explicit specification. SqlClient will figure out the appropriate network library based on the specified Data Source value. When you connect to a local instance using an unqualified name (or the value “(local)”), Shared Memory is used by default. SqlClient will use TCP/IP if a FQDN (e.g. SQLSERVERNAME.MyDOMAIN.COM) or IP address is specified regardless of whether the instance is local or remote. Since TCP/IP is most commonly used nowadays, I’ll focus on TCP/IP in this article and use a FQDN in the subsequent examples to avoid ambiguity.
It is often desirable to specify the initial database context in the connection sting. If omitted, the default database of the authenticated account is used. This is accomplished using either the “Initial Catalog” or “Database” keyword. I suggest always including the “Initial Catalog” keyword.




Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 4: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
Named Instances
The connection strings I’ve shown so far assume the target is a default SQL Server instance listening on port 1433. One can run multiple instances of SQL Server on the same host using the named instance feature. If your target database instance is a named instance, SqlClient will also need to know the instance name or instance port number. The instance name can be specified by appending a backslash and instance name to the Data Source value:




Data Source=SQLSERVERNAME.MYDOMAIN.COM\MYINSTANCE;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 5: Connect to named instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
As an aside, I often see connectivity problems with named instances due to oversights in the SQL Server configuration. When an instance name is specified, SqlClient interrogates the SQL Server Brower service on the SQL Server host to determine the instance port (or named pipe name). The SQL Server Brower service is disabled by default so you need to enable and start it in order to connect by the instance name. This can be done using the SQL Server Configuration Manager tool. Also, since the SQL Server Brower service communicates over UDP port 1434, that port must be allowed through firewalls.
You can specify a port number instead of instance name to directly to a named instance (or to a default instance listing on a non-standard port). The port may be specified by appending a comma and port number to the data source value. The needed port number can be ascertained from the SQL Server Configuration Manager tool.




Data Source=SQLSERVERNAME.MYDOMAIN.COM,60086;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 6: Connect to instance on host SQLSERVERNAME listening on port 60086 using Windows authentication with initial database context of MyDatabase
Additional Keywords

In addition to the “Data Source”, “Initial Catalog” and “Integrated Security” (or “User Id” and “Password”) keywords I’ve discussed so far, I recommend that “Application Name” also be specified. The specified string is helps identify the application when monitoring activity on the database server. This is especially useful when an application server or client hosts multiple applications.




Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase;Application Name=Connection String Example
Listing 7: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase with application name specification
In my opinion, the many other keywords are noise unless the default values are inappropriate for your environment. 
Summary
You can get by nicely in most cases with only the 4 or 5 SqlClient connection string keywords I’ve discussed here. I suggest you establish a connection string standard that includes the “Data Source”, “Initial Catalog”, “Application Name” keywords plus the authentication method, “Integrated Security=SSPI” or “User Id” and “Password”.

Move a Partition to a Different File Group Efficiently

SQL Server table partitioning can reduce storage costs associated with large tables while maintaining performance SLAs.  Table partitioning, available in Enterprise and above SKUs, allows you to keep frequently used current data on fast storage while storing infrequently accessed older data on slower, less expensive storage.  But moving vast amounts of data efficiently as data ages can be a challenge.  This post will discuss alternate techniques to accomplish this task.

Consider the scenario of a table partitioned on a datetime column by month.  Your objective is to keep recent (current and prior month) data on a solid state disk and older data on traditional spinning media.  2 filegroups are used for this table, one with files on a solid state device and the other with files on spinning disks.  The table is partitioned with a RANGE RIGHT partition function (inclusive date boundary) and monthly sliding window maintenance is scheduled to create a partition for the new month and perhaps remove the oldest month.  Every month after the slide, you want to move an older partition (prior month minus 1) from fast to slow storage to make room for new data on the fast file group.

The Simple Method

The easiest way to move a partition from the NewerData file group to the OlderData filegroup is with MERGE and SPLIT.  The example below will move the February partition from the NewerData to the OlderData filegroup:  

Simple maintenance script example:

-- Monthly Partition Move Scipt

-- merge month to be moved into prior month partition

ALTER PARTITION FUNCTION PF_Last12Months()

MERGE RANGE ('20110201');

 

-- set partition scheme next used to the OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

NEXT USED OlderData;

 

-- move data from NewData to OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

SPLIT RANGE ('20110201');

 

The figures below show the partitions before and after this script was run against a 10M row test table (setup script with complete DDL and sample data at the end of this post).  Although this method is quite easy, it can take quite a bit of time with large partitions.  This MERGE command will merge February data into the January partition on the OlderData filegroup, requiring all of February’s data to be moved in the process, and then remove the February partition.  The SPLIT will then create a new February partition on the OlderData filegroup, move February data to the new partition and finally remove the February data from the source partition.  So February data is actually moved twice, once by the MERGE and again by the SPLIT. 

This MERGE/SPLIT process took 52 seconds on my test system with a cold buffer cache but I was only moving 738,780 rows.  Think about the performance impact of this method against a much larger production table partition.  The atomic MERGE and SPLIT are offline operations so the entire table is unavailable while those statements are running.  Also, these operations are resource intensive when a lot of data needs to be moved and/or you have many indexes.

Before maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_NewerData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

After maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_OlderData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

SWITCH and DROP_EXISTING Method

An alternative to the method above is to employ SWITCH along with the DROP EXISTING option of CREATE INDEX.  As you may know, SWITCH of an aligned partition is a metadata-only operation and is very fast because no physical data movement is required.  Furthermore, CREATE INDEX…WITH DROP_EXISTING = ON avoids sorting when the existing table index is already suitably sorted and is especially appropriate for improving performance of large index rebuilds.  Using these commands, instead of relying on SPLIT and MERGE to move data, will greatly reduce the time needed to move a partition from one filegroup to another.  The maintenance script below reduced the time of the partition move from 52 seconds down to 7 seconds, reducing maintenance time by over 85% compared to the MERGE/SPLIT script above.  

Demo Maintenance Script

-- Monthly Partition Move Scipt

DECLARE @MonthToMove datetime = '20110201';

 

-- create staging table on NewerData filegroup with aligned indexes

IF OBJECT_ID(N'dbo.PartitionMoveDemoStaging') IS NOT NULL

      DROP TABLE dbo.PartitionMoveDemoStaging;

CREATE TABLE dbo.PartitionMoveDemoStaging(

      PartitioningDateTimeColumn datetime NOT NULL

      ,Column1 bigint NOT NULL

) ON PartitioningDemo_NewerData;

 

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      ON PartitioningDemo_NewerData;     

 

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      ON PartitioningDemo_NewerData;     

 

-- switch partition into staging table

ALTER TABLE dbo.PartitionMoveDemo

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemoStaging;

 

-- remove partition

ALTER PARTITION FUNCTION PF_Last12Months()

      MERGE RANGE (@MonthToMove);

     

-- set next used to OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

      NEXT USED PartitioningDemo_OlderData;

 

-- recreate partition on OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

      SPLIT RANGE (@MonthToMove);

     

-- recreate staging table indexes using the partition scheme

-- this will move the staging table to OlderData filegroup with aligned indexes

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

 

-- switch staging table back into primary table partition

ALTER TABLE dbo.PartitionMoveDemoStaging

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemo PARTITION $PARTITION.PF_Last12Months(@MonthToMove);

 

The maintenance steps here are similar to the first method except that the partition is SWITCHed into a staging table before the MERGE and SPLIT.  This way, no data movement is needed during the MERGE or SPLIT.  After the MERGE and SPLIT, staging table indexes are recreated using the same partition scheme as the primary table.  This will move the staging table from the NewerData to the OlderData filegroup and ensure staging table indexes are aligned for the SWITCH.  The DROP_EXISTING = ON option allows the CREATE INDEX to leverage the existing staging table index sequence, thus eliminating the need to sort the index keys.  Finally, the staging table is SWITCHed back into the moved partition.

I hope you find this method useful.  Below is the script I used to create the demo database and objects. 

Demo Setup Script

--create database with monthly filegroups

CREATE DATABASE PartitioningDemo

ON(

      NAME='Primary',

      FILENAME='S:\SolidState\PartitioningDemo.mdf',

      SIZE=10MB),

FILEGROUP NewerData (

      NAME='PartitioningDemo_NewerData',

      FILENAME='S:\SolidState\PartitioningDemo_NewerData.ndf',

      SIZE=400MB,

      FILEGROWTH=10MB),

FILEGROUP OlderData (

      NAME='PartitioningDemo_OlderData',

      FILENAME='D:\SpinningDisks\PartitioningDemo_OlderData.ndf',

      SIZE=600MB,

      FILEGROWTH=10MB)

LOG ON(

      NAME='PartitioningDemo_Log',

      FILENAME='L:\LogFiles\PartitioningDemo_Log.ldf',

      SIZE=10MB,

      FILEGROWTH=10MB);

     

ALTER DATABASE PartitioningDemo

      SET RECOVERY SIMPLE;

GO

 

USE PartitioningDemo;

 

CREATE PARTITION FUNCTION PF_Last12Months( datetime )

AS RANGE RIGHT

FOR VALUES

(               -- older_than_current_minus_12

      '20100401'  -- current_minus_12

      ,'20100501' -- current_minus_11

      ,'20100601' -- current_minus_10

      ,'20100701' -- current_minus_9

      ,'20100801' -- current_minus_8

      ,'20100901' -- current_minus_7

      ,'20101001' -- current_minus_6

      ,'20101101' -- current_minus_5

      ,'20101201' -- current_minus_4

      ,'20110101' -- current_minus_3

      ,'20110201' -- current_minus_2

      ,'20110301' -- current_minus_1

      ,'20110401' -- current

      ,'20110501' -- future

);

 

CREATE PARTITION SCHEME PS_Last12Months

AS PARTITION PF_Last12Months

TO

      (

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      NewerData, -- minus 2 month (to be moved to OlderData)

      NewerData, -- minus 1 month

      NewerData, -- current month

      NewerData  -- future month+

      );

 

-- create table with 10,000,000 rows

ALTER DATABASE PartitioningDemo

      MODIFY FILEGROUP NewerData DEFAULT;

 

WITH

      t1 AS (SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2

                            UNION ALL SELECT 3 UNION ALL SELECT 4

                            UNION ALL SELECT 5 UNION ALL SELECT 6

                              UNION ALL SELECT 7 UNION ALL SELECT 8

                              UNION ALL SELECT 9),

      t2 AS (SELECT a.n

                    FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g)

SELECT

      ISNULL(

            DATEADD(

                  day

                  , (ROW_NUMBER() OVER(ORDER BY t2.n))/26385, '20100401')

                  , '20100401') AS PartitioningDateTimeColumn

      ,ISNULL((ROW_NUMBER() OVER(ORDER BY t2.n)), 0) AS Column1

INTO dbo.PartitionMoveDemo

FROM t2;

 

-- create indexes partitioned indexes on table

CREATE CLUSTERED INDEX cdx_PartitionMoveDemo_PartitioningColumn

      ON dbo.PartitionMoveDemo(PartitioningDateTimeColumn)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemo_Column1

      ON dbo.PartitionMoveDemo(Column1)

      ON PS_Last12Months(PartitioningDateTimeColumn);

GO

Stairway Series on SQLServerCentral.com

SQLServerCentral.com launched a new Stairway content series today, targeting specific areas of SQL Server.  Each Stairway includes a series of up to 12 levels focused on a specific SQL Server topic.  The goal is to guide DBAs and developers with little or no understanding of a subject through a sequence of tutorials in order to quickly gain the knowledge one needs to use a SQL Server feature confidently in a production environment.  Kalen Delaney, editor of the Stairway series, is one of the most respected experts in the world-wide SQL Server community.

I was flattered when Kalen gave me the opportunity to contribute to the series with a Stairway on Server-side Tracing.  For years I’ve cautioned against using Profiler indiscriminately both here as well as in the MSDN forums and newsgroups.  But it seems many DBAs still don’t differentiate between Profiler and server-side tracing.  I’m hoping this Server-side Tracing Stairway will empower DBAs with the knowledge to choose the right tool for the job.

My apologies for having gone dark for the last several months.   The subject of this post is the primary reason; there are only so many hours in the day L

Calendar Table and Date/Time Functions

I frequently see questions in the forums and newsgroups about how to best query date/time data and perform date manipulation.  Let me first say that a permanent calendar table that materializes commonly used DATEPART values along with time periods you frequently use is invaluable.  I’ve used such a table for over a decade with great success and strongly recommend you implement one on all of your database servers.  I’ve included a sample calendar table (and numbers table) later in this post and you can find other variations of such a table via an internet search.

Removing the Time Portion

A common requirement I have is to remove the time portion from a date/time value.  This is easy in SQL 2008 since you can simply “CAST(SomeDateTimeValue AS date)”.  But the date data type is not available in older SQL Server versions so you need an alternate method.  In SQL 2005 and earlier versions, I recommend the DATEADD…DATEDIFF method below with an arbitrary base date value specified in a format that is independent of the session DATAFORMAT setting:

SELECT CAST(GETDATE() AS date); --SQL 2008 and later

SELECT DATEADD(day, DATEDIFF(day, '19000101', GETDATE()), '19000101'); --SQL 2005 and earlier

 

I often see a variation of the DATEADD…DATEDIFF technique with the integer zero (no quotes) specified as the base date.  Although this may provide the expected results (I’ve done it myself), I caution against it because it relies on implicit conversion from the internal SQL Server integer date/time storage format.  If you want to be concise, a better approach is to specify an empty string for the base date value since the default value is ‘1900-01-01 00:00:00’.  In my opinion, an explicit data value is more intuitive, though.

SELECT DATEADD(day, DATEDIFF(day, '', GETDATE()), '');

 

I also sometimes see code that extracts the year, month and day date parts and concatenates with separators.  However, that method is dependent on session DATEFORMAT settings and slower than other methods.  See Tibor Karaszi’s The ultimate guide to the datetime datatypes article for details.

First and Last Day of Period

Another common task is to determine the first or last day of a given period.  The script below shows how to accomplish this of you don’t have a calendar table with the calculated values available.

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], DATEADD(year, DATEDIFF(year,'19000101',@Date), '19000101') AS [CalendarDate]

UNION ALL

SELECT 'Last day of year', DATEADD(day,-1,DATEADD(year,0,DATEADD(year,DATEDIFF(year,'19000101',@Date)+1,'19000101')))

UNION ALL

SELECT 'First day of month', DATEADD(month, DATEDIFF(month,'19000101',@Date), '19000101')

UNION ALL

SELECT 'Last day of month', DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'19000101',@Date)+1,'19000101')))

UNION ALL

SELECT 'First day week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1),DATEDIFF(day,'19000101', @Date))

UNION ALL

SELECT 'Last day of week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1)+6,DATEDIFF(day,'19000101', @Date));

 

With a calendar table like the one later in this post:

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], (SELECT FirstDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of year', (SELECT LastDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'First day of month', (SELECT FirstDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of month', (SELECT LastDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'First day week (based on DATEFIRST setting)', (SELECT FirstDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of week (based on DATEFIRST setting)', (SELECT LastDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date);

 

Calendar and Numbers Table

I think auxiliary calendar and number tables are a must-have on every database server.  These objects allow you to easily perform set-based processing in a number of scenarios.  In fact, the calendar table population script below uses a numbers table to populate the calendar table with several thousand rows in under a second.  This is much more efficient that a WHILE loop.

This calendar table population script also updates the table with most US holidays and adjusts business/non-business days accordingly.  In addition to customizing the script for holidays as observed by your organization, you might add fiscal period start/end dates to facilitate querying based on those cycles.  Also consider creating user-defined functions or stored procedures to encapsulate frequently used code that uses the calendar table.  For example, here is a function that returns the date that is a specified number of business days from the date provided:

CREATE FUNCTION dbo.udf_AddBusinessDays

(@Date date, @BusinessDays int)

RETURNS date

AS

BEGIN

      RETURN (

            SELECT TOP (1) CalendarDate AS BusinessDate

            FROM (SELECT TOP (@BusinessDays) CalendarDate

                  FROM dbo.Calendar

                  WHERE

                        CalendarDate > @Date

                        AND BusinessDay = 1

                  ORDER BY CalendarDate) AS BusinessDays

            ORDER BY CalendarDate DESC

      )

END

GO

Script 1: Example calendar table utility function

--auxiliary number table

CREATE TABLE dbo.Numbers(

      Number int NOT NULL

            CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED

      );

     

--load Numbers table with 1,000,000 numbers

WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)

      ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)

      ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)

      ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)

      INSERT INTO dbo.Numbers WITH (TABLOCKX) (

            Number

            )

            SELECT number

            FROM numbers

                                                                                                                                                                                                                        WHERE number < 1000000;                                                                                    

Script 2: Create and populate numbers table

CREATE TABLE dbo.Calendar(
 CalendarDate date NOT NULL
  CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED
 ,CalendarYear int NOT NULL
 ,CalendarMonth int NOT NULL
 ,CalendarDay int NOT NULL
 ,DayOfWeekName varchar(10) NOT NULL
 ,FirstDateOfWeek date NOT NULL
 ,LastDateOfWeek date NOT NULL 
 ,FirstDateOfMonth date NOT NULL
 ,LastDateOfMonth date NOT NULL
 ,FirstDateOfQuarter date NOT NULL
 ,LastDateOfQuarter date NOT NULL
 ,FirstDateOfYear date NOT NULL
 ,LastDateOfYear date NOT NULL
 ,BusinessDay bit NOT NULL
 ,NonBusinessDay bit NOT NULL
 ,Weekend bit NOT NULL
 ,Holiday bit NOT NULL
 ,Weekday bit NOT NULL
 ,CalendarDateDescription varchar(50) NULL
);
GO

--load dates from 2000-01-01 through 2099-12-31
WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
 ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
 ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b)
 ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)
INSERT INTO dbo.Calendar WITH (TABLOCKX) (
 CalendarDate
 ,CalendarYear
 ,CalendarMonth
 ,CalendarDay
 ,DayOfWeekName
 ,FirstDateOfWeek
 ,LastDateOfWeek
 ,FirstDateOfMonth
 ,LastDateOfMonth
 ,FirstDateOfQuarter
 ,LastDateOfQuarter
 ,FirstDateOfYear
 ,LastDateOfYear
 ,BusinessDay
 ,NonBusinessDay
 ,Weekend
 ,Holiday
 ,Weekday
 ,CalendarDateDescription
 )
SELECT
 CalendarDate = DATEADD(day, number, '20000101')
 ,CalendarYear = DATEPART(year, DATEADD(day, number, '20000101'))
 ,CalendarMonth = DATEPART(month, DATEADD(day, number, '20000101'))
 ,CalendarDay = DATEPART(day, DATEADD(day, number, '20000101'))
 ,DayOfWeekName = DATENAME(weekday, DATEADD(day, number, '20000101'))
 ,FirstDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1),DATEADD(day, number, '20000101'))
 ,LastDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1)+6,DATEADD(day, number, '20000101'))
 ,FirstDateOfMonth = DATEADD(month, DATEDIFF(month,'20000101',DATEADD(day, number, '20000101')), '20000101')
 ,LastDateOfMonth = DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'20000101',DATEADD(day, number, '20000101'))+1,'20000101')))
 ,FirstDateOfQuarter = DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101')), '20000101')
 ,LastDateOfQuarter = DATEADD(day, -1, DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))
 ,FirstDateOfYear = DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101')), '20000101')
 ,LastDateOfYear = DATEADD(day,-1,DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))
 --initially set all weekdays are business days
 ,BusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END
 --all weekends are non-business days
 ,NonBusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END
 ,Weekend = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END
 ,Holiday = 0 --initially no holidays
 ,Weekday = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END
 ,CalendarDateDescription = NULL
FROM numbers
WHERE number < DATEDIFF(day, '20000101', '20991231') + 1;

--New Year's Day
UPDATE dbo.calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'New Year''s Day'
WHERE
    CalendarMonth = 1
    AND CalendarDay = 1;

--New Year's Day celebrated on Friday, December 31 when January 1 falls on Saturday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
    ,CalendarDateDescription = 'New Year''s Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 31
    AND DayOfWeekName = 'Friday';
   
--New Year's Day celebrated on Monday, January 2 when January 1 falls on Sunday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
    ,CalendarDateDescription = 'New Year''s Day Celebrated'
WHERE
    CalendarMonth = 1
    AND CalendarDay = 2
    AND DayOfWeekName = 'Monday';   

--Martin Luther King Day - 3rd Monday in January
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Martin Luther King Day'
WHERE
    CalendarMonth = 1
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*)
  FROM dbo.Calendar c2
        WHERE
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 3;
       

--President's Day - 3rd Monday in February
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'President''s Day'
WHERE
    CalendarMonth = 2
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*)
  FROM dbo.Calendar c2
        WHERE
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 3;
       
--Easter - first Sunday after the full moon following the vernal (March 21) equinox
WITH
 t4 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
    ,t256 AS (SELECT 0 AS n FROM t4 t4a, t4 t4b, t4 t4c)
    ,years AS (SELECT ROW_NUMBER() OVER(ORDER BY n) + 1999 AS year FROM t256)
 ,n AS (SELECT years.year, years.year - (19 * (years.year / 19)) AS n FROM years)
 ,century AS (SELECT years.year, years.year / 100 AS century FROM years)
 ,i AS (SELECT century.year, century.century - (century.century / 4) - ((century.century - ((century.century - 17) / 25)) / 3) + (19 * n.n) + 15  AS i
  FROM century
  JOIN n ON n.year = century.year)
 ,i2 AS (SELECT i.year, i.i - (30 * (i.i / 30 ) ) AS i2
  FROM i)
 ,i3 AS (SELECT i2.year, i2.i2 - ((i2.i2 / 28) * (1 - (i2.i2 / 28) * (29 / (i2.i2 + 1)) * ((21 - n.n) / 11)) ) AS i3
  FROM i2
  JOIN n ON n.year = i2.year)
 ,j AS (SELECT i3.year, i3.year + (i3.year / 4) + i3.i3 + 2 - century.century + (century.century / 4 ) AS j
  FROM i3
  JOIN century ON century.year = i3.year)
 ,j2 AS (SELECT j.year, j.j - (7 * (j.j / 7) ) AS j2
  FROM j)
 ,month AS (SELECT j2.year, 3 + (((i3.i3 - j2.j2) + 40) / 44 ) AS month
  FROM j2
  JOIN i3 ON i3.year = j2.year)
 ,day AS (SELECT month.year, month.month, i3.i3 - j2.j2 + 28 - (31 * ( month.month / 4 ) ) AS day
  FROM i3
  JOIN j2 ON j2.year = i3.year
  JOIN month ON month.year = j2.year)
 ,easter AS (SELECT CAST(DATEADD(year, month.year-1900, DATEADD(day, day.day-1, DATEADD(month, month.month-1, ''))) AS date) AS easter
  FROM month
  JOIN day ON day.month = month.month AND day.year = month.year)
UPDATE dbo.Calendar
SET
 Holiday = 1
    ,CalendarDateDescription = 'Easter'
WHERE
    CalendarDate IN(
  SELECT easter
  FROM easter
  );

--Good Friday - 2 days before Easter Sunday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Good Friday'
WHERE
    CalendarDate IN(
        SELECT DATEADD(day, -2, c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE c2.CalendarDateDescription = 'Easter'
        );

--Memorial Day - last Monday in May
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Memorial Day'
WHERE
    CalendarMonth = 5
    AND DayOfWeekName = 'Monday'
    AND CalendarDate IN(
        SELECT MAX(c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE
            c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = 5
            AND c2.DayOfWeekName = 'Monday'
        );

--Independence Day - July 4th
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Independence Day'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 4;

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
    ,CalendarDateDescription = 'Independence Day Celebrated'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 3
    AND DayOfWeekName = 'Friday';

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
    ,CalendarDateDescription = 'Independence Day Celebrated'
WHERE
    CalendarMonth = 7
    AND CalendarDay = 5
    AND DayOfWeekName = 'Monday';
       
--Labor Day - first Monday in September
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Labor Day'
WHERE
    CalendarMonth = 9
    AND DayOfWeekName = 'Monday'
    AND CalendarDate IN(
        SELECT MIN(c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE
            c2.CalendarYear = calendar.CalendarYear
            AND c2.CalendarMonth = 9
            AND c2.DayOfWeekName = 'Monday'
        );

--Columbus Day - second Monday in October
UPDATE dbo.Calendar
SET
 Holiday = 1
    ,CalendarDateDescription = 'Columbus Day'
WHERE
    CalendarMonth = 10
    AND DayOfWeekName = 'Monday'
    AND (SELECT COUNT(*)
  FROM dbo.Calendar c2
        WHERE
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Monday'
        ) = 2;

--Veteran's Day - November 11
UPDATE dbo.Calendar
SET
 Holiday = 1
    ,CalendarDateDescription = 'Veteran''s Day'
WHERE
    CalendarMonth = 11
    AND CalendarDay = 11;

--Thanksgiving - fourth Thursday in November
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Thanksgiving'
WHERE
    CalendarMonth = 11
    AND DayOfWeekName = 'Thursday'

    AND (SELECT COUNT(*) FROM
  dbo.Calendar c2
        WHERE
            c2.CalendarDate <= Calendar.CalendarDate
            AND c2.CalendarYear = Calendar.CalendarYear
            AND c2.CalendarMonth = Calendar.CalendarMonth
            AND c2.DayOfWeekName = 'Thursday'
        ) = 4;

UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Day after Thanksgiving'
WHERE
    CalendarDate IN(
        SELECT DATEADD(day, 1, c2.CalendarDate)
        FROM dbo.Calendar c2
        WHERE c2.CalendarDateDescription = 'Thanksgiving'
        );
      
--Christmas Day - December 25th
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
 ,Holiday = 1
    ,CalendarDateDescription = 'Christmas Day'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 25;

--Christmas day celebrated on Friday, December 24 when December 25 falls on a Saturday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
    ,CalendarDateDescription = 'Christmas Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 24
    AND DayOfWeekName = 'Friday';

--Christmas day celebrated on Monday, December 24 when December 25 falls on a Sunday
UPDATE dbo.Calendar
SET
 BusinessDay = 0
 ,NonBusinessDay = 1
    ,CalendarDateDescription = 'Christmas Day Celebrated'
WHERE
    CalendarMonth = 12
    AND CalendarDay = 26
    AND DayOfWeekName = 'Monday';
                       

Script 3: Create and populate calendar table and update with holidays

 

Secret of SQL Trace Duration Column

Why would a trace of long-running queries not show all queries that exceeded the specified duration filter?  We have a server-side SQL Trace that includes RPC:Completed and SQL:BatchCompleted events with a filter on Duration >= 100000.  Nearly all of the queries on this busy OLTP server run in under this 100 millisecond threshold so any that appear in the trace are candidates for root cause analysis and/or performance tuning opportunities.

After an application experienced query timeouts, the DBA looked at the trace data to corroborate the problem.  Surprisingly, he found no long-running queries in the trace from the application that experienced the timeouts even though the application’s error log clearly showed detail of the problem (query text, duration, start time, etc.).  The trace did show, however, that there were hundreds of other long-running queries from different applications during the problem timeframe.  We later determined those queries were blocked by a large UPDATE query against a critical table that was inadvertently run during this busy period.

So why didn’t the trace include all of the long-running queries?  The reason is because the SQL Trace event duration doesn’t include the time a request was queued while awaiting a worker thread.  Remember that the server was under considerable stress at the time due to the severe blocking episode.  Most of the worker threads were in use by blocked queries and new requests were queued awaiting a worker to free up (a DMV query on the DAC connection will show this queuing: “SELECT scheduler_id, work_queue_count FROM sys.dm_os_schedulers;”).  Technically, those queued requests had not started.  As worker threads became available, queries were dequeued and completed quickly.  These weren’t included in the trace because the duration was under the 100ms duration filter.  The duration reflected the time it took to actually run the query but didn’t include the time queued waiting for a worker thread.

The important point here is that duration is not end-to-end response time.  Duration of RPC:Completed and SQL:BatchCompleted events doesn’t include time before a worker thread is assigned nor does it include the time required to return the last result buffer to the client.  In other words, duration only includes time after the worker thread is assigned until the last buffer is filled.  But be aware that duration does include the time need to return intermediate result set buffers back to the client, which is a factor when large query results are returned.  Clients that are slow in consuming results sets can increase the duration value reported by the trace “completed” events.

Ad-Hoc Rollup by date/time Interval

I often use aggregate queries to rollup data by an arbitrary date/time interval.  I'll share some techniques that I use to accomplish the task in case you find these useful, using the same table below:

CREATE TABLE dbo.WebStats

(

      RequestTimestamp datetime NOT NULL,

      Page varchar(255) NOT NULL

);

CREATE CLUSTERED INDEX WebStats_cdx ON dbo.WebStats(RequestTimestamp, Page);

 

INSERT INTO dbo.WebStats (RequestTimestamp, Page)

VALUES

      ('2010-01-01T00:00:00', 'Default.aspx')

      ,('2010-01-01T00:00:15', 'Default.aspx')

      ,('2010-01-01T00:01:05', 'Order.aspx')

      ,('2010-01-01T00:01:30', 'Default.aspx')

      ,('2010-01-01T00:01:40', 'OrderStatus.aspx')

      ,('2010-01-01T00:02:05', 'Default.aspx')

      ,('2010-01-01T00:03:05', 'ProductInfo.aspx')

      ,('2010-01-01T00:03:30', 'Default.aspx');

GO

 

Simple Rollup

Without an auxiliary table, a little DATEADD magic can do the trick.  Here's an example that summarizes web page requests by minute for the specified date/time range:

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-02T00:00:00';

 

SELECT

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp) AS Interval,

      COUNT(*) AS PageRequests

FROM dbo.WebStats

GROUP BY

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp)

ORDER BY

      Interval; 

 

Results:

Interval

PageRequests

2010-01-01 00:00:00.000

2

2010-01-01 00:01:00.000

3

2010-01-01 00:02:00.000

1

2010-01-01 00:03:00.000

2

2010-01-01 00:29:00.000

1

2010-01-01 00:31:00.000

1

2010-01-01 00:42:00.000

1

2010-01-01 02:01:00.000

2

2010-01-01 02:03:00.000

2

2010-01-01 02:31:00.000

1

2010-01-01 02:44:00.000

1

2010-01-01 02:49:00.000

1

 

Arbitrary Intervals

The simple rollup method works well for any of the pre-defined units provided by the DATEADD function (year, quarter, month, day, hour, minute, second or week).  However, it lacks the flexibility to roll up to an arbitrary interval like 15 minutes or 30 seconds.  A little DATEADD/DATEDIFF math addresses this gap.  Below is an example of a 30-minute interval rollup using this technique:

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-01T04:00:00'

      ,@IntervalSeconds int = 1800; --30 minutes

SELECT

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp)

            / @IntervalSeconds * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(*) AS PageRequests

FROM dbo.WebStats

WHERE

      RequestTimestamp >= @StartTimestamp

      AND RequestTimestamp < @EndTimestamp

GROUP BY

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp) / @IntervalSeconds * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

 

Missing Intervals

You probably noticed that periods with no activity at all are omitted rather than reporting a zero value.  One method to include the missing intervals is with an outer join to a temporal table containing all the desired intervals.  Ideally, the temporal table would be a permanent one but I've found it impractical to maintain such a table for ad-hoc needs.  Fortunately, a utility numbers CTE is a handy way to generate the needed intervals dynamically.  The example below provides up to 65,536 interval values and can be easily extended as needed.

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-01T04:00:00'

      ,@IntervalSeconds int = 1800; --30 minutes

 

WITH

      T2 AS (SELECT 0 AS Num UNION ALL SELECT 0),

      T4 AS (SELECT 0 AS Num FROM T2 AS A CROSS JOIN T2 AS B),

      T256 AS (SELECT 0 AS Num FROM T4 AS A CROSS JOIN T4 AS B CROSS JOIN T4 AS C CROSS JOIN T4 AS D),

      T65536 AS (SELECT ROW_NUMBER() OVER(ORDER BY A.Num) AS Num FROM T256 AS A CROSS JOIN T256 AS B)

SELECT

      DATEADD(second

            ,(Num-1) * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(WebStats.RequestTimestamp) AS PageRequests

FROM T65536

LEFT JOIN dbo.WebStats ON

      WebStats.RequestTimestamp >= DATEADD(second, (Num-1) * @IntervalSeconds, @StartTimestamp)

      AND WebStats.RequestTimestamp < DATEADD(second, Num * @IntervalSeconds, @StartTimestamp)

WHERE

      Num <= DATEDIFF(second, @StartTimeStamp, @EndTimestamp) / @IntervalSeconds

GROUP BY

      DATEADD(second

            ,(Num-1) * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;  

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 01:00:00.000

0

2010-01-01 01:30:00.000

0

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

2010-01-01 03:00:00.000

0

2010-01-01 03:30:00.000

0

 

Collation Hell (Part 3)

In this final post of my Collation Hell series, I'll discuss techniques to change a SQL Server instance collation along with the collation of all databases and columns.  The objective is to ensure the standard collation is used throughout the entire SQL Server instance.  See part 1 and part 2 of this series for more information on selecting a standard collation and planning such a collation change.

Be aware that a complete collation change is not unlike that of a major version upgrade, except tools to facilitate the change are limitted.  You'll need to build new system databases, change user databases and change every character column to conform to the new collation.  These collation changes can be done using either a side-by-side migration technique or performed in-place.

Changing the Instance Collation

The SQL Server setup REBUILDDATABASE option (see Books Online) is used to create new system databases for an existing instance with the desired collation.  One advantage of using REBUILDDATABASE over a complete reinstall is that post-RTM service packs and patches don't need to be reapplied afterward.  However, all server level objects like logins, linked servers, jobs, etc. need to be recreated after the rebuild so you'll need to script those out beforehand.  User databases and columns will need to be changed separately, which I'll discuss in more detail later.

You can also perform a fresh SQL Server install on another instance for a side-by-side migration.  One of the advantages of this side-by-side migration technique is that fallback is fast and relatively easy.  The side-by-side migration method is attractive if you plan a server hardware and/or SQL version upgrade anyway.  However, like the REBUILDDATABASE, you will need to create server-level objects after the install. 

Changing User Database Collation

Before I get into the details of a database collation change, please vote on Connect feedback item Make it easy to change collation on a database.  Until such a feature us available, we will endure the pain of performing this task manually.

Assuming you have performed due diligence and remediation beforehand (see my collation change planning article), changing the database collation in-place is relatively easy.  A simple ALTER DATABASE will change the collation of all user database system objects as well as the database default collation:

ALTER DATABASE Foo

COLLATE Latin1_General_CI_AS;

But note that this database collation change does not actually change the collation of existing user table columns.  Columns that do not match the database collation must be changed individually to conform, which is why a mass collation change is such a PITA.  You might choose to rebuild the database using a side-by-side method so that both the database and column collations can be changed during the rebuild process.  I generally recommend such a side--by-side method unless you are constrained by storage space.

Changing Column Collation Using ATLER TABLE...ALTER COLUMN

The syntax for changing a column collation is simple; just execute ALTER TABLE...ALTER COLUMN using the same column definition except for new column collation:

ALTER TABLE dbo.Foo ALTER COLUMN

      Bar varchar(50) COLLATE Latin1_General_CI_AS NOT NULL;

The above DDL method appears simple at first glance but there are many caveats that make this method problematic, especially when it must be repeated for many tables, large databases and/or a code page change is involved.  ALTER TABLE...ALTER COLUMN may be acceptable for a isolated change but not necessarily for a mass one.  The major issues are:

·         Each column must be changed individually

You'll need a separate ALTER COLUMN statement for each character column in the database.  A T-SQL script that generates the needed DDL using the catalog views is a must.  See Louis Davidson's Change table collations en masse article for an example and be aware that text columns are problematic.

·         Column references must be dropped

The altered column cannot be referenced by a constraint, index, statistic, computed column or schemabound object.  This means that all of these references must be dropped before the column is altered and recreated afterward.

·         Data are updated with a code page change

ALTER TABLE...ALTER COLUMN is a always a fast metadata-only change with a Unicode column.  The operation is also a metadata-only change for a non-Unicode column, but only if the old and new collations have the same code page/character set. 

When the old and new collations have a different code page/character set, then every row must be updated when a non-Unicode column is changed.  The performance ramifications of such an update are huge, especially with large tables.  A full table scan is required for each ALTER statement and every row in the table will be updated.  Also, since SQL Server internally drops the old column and adds a new one, the internal row size increases considerably.  Be aware that space requirements for modified non-Unicode columns will more than double until the clustered index is (re)built.  To reclaim the space of a heap, you'll need to create and drop a clustered index.  Keep in mind that the ALTER operation is fully logged regardless of the database recovery model so you need to plan log space requirements accordingly.

Because of these considerations, I do not recommend using ALTER TABLE...ALTER COLUMN for a mass collation change, especially when non-Unicode columns are involved and the code page/character set of the collations are different.  Instead, migrate data to a new table with columns of the desired collation.

Changing Column Collation Using a New Table

If you cannot perform a side-by-side migration of the entire database using a side-by-side method due to storage constraints, an alternative to ALTER TABLE...ALTER COLUMN is to create a new table with the desired collation and then copy data from the original table.  I also recommend this method over ALTER TABLE...ALTER COLOMN when migrating to a different code page/character set for the reasons I previously mentioned.

1.       Change the database recovery model to SIMPLE to minimize log space requirements

2.       Drop all constraints, except clustered primary key and clustered unique constraints

3.       Drop all non-clustered indexes to free up disk space for the migration

4.       For each table:

o   Create a new table exactly like the original, except with a different name and new collation for all character columns

o   Create the clustered index and check constraints

o   Load data

·         Use INSERT...SELECT to load the new table.  Be sure to specify a TABLOCKX hint on the INSERT so that the operation is minimally logged.  If the table has an identity column. be sure to SET IDENTITY_INSERT...ON to retain the existing identity values.

o   Drop the old table after successful copy and rename new table to old name

5.       Create non-clustered indexes, constraints, triggers, object permissions, etc.

Summary

I cannot overstate the importance of choosing the right collation during the initial install since it is difficult to change after the fact.  Unfortunately, we often inherit instances and databases of varying collations and must evaluate the effort of the collation change against the benefits of a consistent collation.  If you are considering a collation change, be sure to test beforehand to avoid surprises during and after the migration and have a solid fallback plan.

 

Collation Hell (Part 2)

In my last post, I discussed why one should avoid a mixed collation environment and how to choose the right collation for your environment.  This post focuses on planning a collation change.

Should You Change Existing Collations?

Once you choose a standard collation (or at least a preferred one) for your organization, you'll need to decide if the change to existing instances, databases and columns is worth the effort and risk.  Keep in mind that the effort involves not only the actual collation change but also testing along with possible changes to code and data to maintain the desired behavior.  Such a remediation project can be quite significant depending on the old/new collation and scope of the change so you need to weigh the pros and cons to determine if the effort is justified.

Note that changing collations need not be an all-or-none decision; you might choose to convert only some (or none) of your existing instances/databases while enforcing the collation standard for new installations.  You can identify the instances that are causing the most grief and weigh those accordingly.

A number of factors influence the effort and risk of a collation change.  A change to language, sensitivity and/or code page is often more complex than a conversion from a SQL collation to a Windows collation (or Windows to SQL) of the same language and sensitivity.  Let me discuss these scenarios in more detail so that you can better ascertain the effort and risk involved in your environment for planning purposes.

Windows vs. SQL Collation Change

A conversion between a SQL and Windows collation of the same language, sensitivity and code page ought to be fairly straightforward due to the same character set and similar comparison rules.  As with any collation change, there are differences in behavior though.  The main difference here is that Windows collations use word sort behavior so slightly different sorting/comparison behavior will result.  The script below shows such a difference with identical data

--SQL collation: compares greater than

IF 'coop' COLLATE SQL_Latin1_General_CP1_CI_AS < 'co-op' COLLATE SQL_Latin1_General_CP1_CI_AS

      PRINT 'less than'

ELSE IF 'coop' COLLATE SQL_Latin1_General_CP1_CI_AS = 'co-op' COLLATE SQL_Latin1_General_CP1_CI_AS

      PRINT 'equal'

ELSE IF 'coop' COLLATE SQL_Latin1_General_CP1_CI_AS > 'co-op' COLLATE SQL_Latin1_General_CP1_CI_AS

      PRINT 'greater than'

ELSE PRINT 'UNKNOWN'

 

--Windows collation: compares less than

IF 'coop' COLLATE Latin1_General_CI_AS < 'co-op' COLLATE Latin1_General_CI_AS

      PRINT 'less than'

ELSE IF 'coop' COLLATE Latin1_General_CI_AS = 'co-op' COLLATE Latin1_General_CI_AS

      PRINT 'equal'

ELSE IF 'coop' COLLATE Latin1_General_CI_AS > 'co-op' COLLATE Latin1_General_CI_AS

      PRINT 'greater than'

ELSE PRINT 'UNKNOWN'

 

All things being equal, a conversion from/to a Windows collation will likely require few changes, if any, to code and schema (besides the collation change).  On the other hand, converting to a collation of different sensitivity and/or character set is often be more challenging

Sensitivity Change

You might recall that the instance collation determines the sensitivity for variable names and labels while the database collation determines sensitivity of identifiers and literals.  I always match characters exactly in variable names, labels identifiers (including table aliases) regardless of whether I'm using a sensitive or insensitive collation and never use names that differ only by case.  Not only does naming consistency make code cleaner, this practice facilitates moving between collations.  However, it is unlikely that all database developers were so anal in their naming so be aware that you'll probably need to make code or schema changes in order to convert between collations of different sensitivity.

A change from a case-sensitive collation to a case-insensitive one is usually minor, at least from a code perspective.   The same schema/code that runs in a case-sensitive environment will run in a case-insensitive collation as long as you don't encounter names and identifiers in the same scope that differ only by case (e.g. @customerID and @CustomerID).  Such a deliberate practice is uncommon in my experience but these conflicts must be addressed before changing to a case-insensitive collation.

One usually strives to store and query data using a consistent case (especially all upper/lower) under a case-sensitive collation.  If this practice was not followed, data that was unique under a case-sensitive collation will not be regarded as such under case-insensitive rules and prevent unique indexes (including primary key or unique constraints) from begin created.  This might actually be a good thing when the real issue is bad data (i.e. duplicates inadvertently allowed due to inconsistent case).  However, you may need to deviate from the case-insensitive standard at the column level in some situations due to business requirements, such as to enforce uniqueness of case-sensitive part numbers.

Going from a case-insensitive to a case-sensitive or binary collation (which I don't personally recommend) will typically require more changes.  Developers tend to be a bit sloppy with matching case under a case-insensitive collation because there is no requirement to do so.  Don't be surprised if a lot of code and queries must be changed once variables and identifiers become case sensitive.  Furthermore, you may need to update data to a consistent case and also make application changes to ensure data are stored in a consistent case.

The considerations that apply to case sensitivity also apply to other collation sensitivity options (accent, Kana and width).  I wouldn't expect as many issues compared to a change in case sensitivity in most cases, though.

Character Set Change

A change in code page is a non-issue when char/varchar/text data contains only ASCII characters.  If you have a character outside the ASCII range (0-127, 0x00-0x7F), a code page change will present a problem when the character doesn't also exist in the target collation's code page.  Such a character will instead be mapped to an alternate character (e.g. 'À' to 'A' in example below) or the catch-all '?' (e.g. '€' to '?' in example below).  If this mapping is unacceptable, you'll need to change the data type to Unicode (nchar/nvarchar/ntext) or update data to conform to the target code page.

CREATE TABLE dbo.Foo(

      Bar char(1) COLLATE Latin1_General_CI_AS

      );

INSERT INTO dbo.Foo (Bar) VALUES('A');

INSERT INTO dbo.Foo (Bar) VALUES('À');

INSERT INTO dbo.Foo (Bar) VALUES('€');

 

--list values not mapped identically

SELECT Bar AS OriginalValue, Bar COLLATE Japanese_90_BIN AS MappedValue

FROM dbo.Foo

WHERE

      CAST(CAST(Bar AS nvarchar(MAX)) AS varbinary(MAX)) <>

      CAST(CAST(Bar COLLATE Japanese_90_BIN AS nvarchar(MAX)) AS varbinary(MAX));

OriginalValue

MappedValue

À

A

?

 

If you are unsure if you have problem characters, the above script shows one method to identify these.  This script converts the original collation characters to Unicode and then to varbinary and repeats the technique for the target collation.  An inequality of the two values indicates an inexact mapping that may require remediation.

Language Change

I'm sure some of you have inherited different language collations due to mergers and acquisitions or inattention to detail during installation.  Be mindful that the topic of supporting multiple languages/locales is much larger than just collation.  I'm only discussing a collation language change here but if you need to fully support multiple languages in a single database, you must also consider other factors such as a schema that supports multiple translations, currency and UOM conversion and applications that are sensitive to client locale.

You may experience different behavior after a collation language change due to the different sorting and comparison semantics.  The script below illustrates such a difference.  Even if you chose a collation that supports the majority of your users' languages, that collation might be less than ideal for the user minority.  Consider performing some operations in application code instead of SQL Server when the standard collation behavior is unacceptable for the task at hand.

--returns both 'Schröder' and 'Schroeder'

DECLARE @Foo TABLE(

      LastName nvarchar(10) COLLATE German_PhoneBook_CI_AS);

INSERT INTO @Foo VALUES(N'Schröder');

INSERT INTO @Foo VALUES(N'Schroeder');

SELECT LastName FROM @Foo

WHERE LastName LIKE N'%oe%';

GO

--returns only 'Schroeder'

DECLARE @Foo TABLE(

      LastName nvarchar(10) COLLATE Latin1_General_CI_AS);

INSERT INTO @Foo VALUES(N'Schröder');

INSERT INTO @Foo VALUES(N'Schroeder');

SELECT LastName FROM @Foo

WHERE LastName LIKE N'%oe%';

GO

 

Summary

A collation change effort varies considerably depending on the size and complexity of the environment.  Perform due diligence before embarking on a collation change.  I don't want to discourage anyone from changing collations but as much as a mixed collation environment is a pain, a botched remediation project is even worse.  Be sure to plan accordingly.

I'll share different methods to change collations in my last post of this series.

Collation Hell (Part 1)

I inherited a mixed collation environment with more collations than I can count on one hand.  The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions.  Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.  Here's a brief overview of collations and some guidance to help you choose the right collation for your organization and new SQL installations.

Collation Overview

A collation determines the rules SQL Server uses to compare and sort character data.  These rules are language/locale aware and may also be sensitive to case, accent, Kana and width.  Collation suffixes identify dictionary rule (in)sensitivity:  _CS (case sensitive), _CI (case insensitive), _AS (accent sensitive), _AI (accent insensitive) and _KS (Kana sensitive).   Binary collations, identified by suffixes _BIN (binary) and _BIN2 (binary-code point), are sensitive in all regards.

A collation determines which characters can be stored in non-Unicode character data types and the bit patterns used for storage.  Char, varchar and text data types can store only 256 different characters due to the single byte limitation.  The first 128 characters (0-127, 0x00-0x7F) are the same for all collations as defined by the ASCII character set and the remaining 128 characters (128-255, 0x80-0xFF) vary according to the code page associated with the collation.  Characters without an associated code point are mapped to an either an alternate character or to the catch-all '?' character.

Collations are grouped into Windows and SQL collations.  Windows collations provide sorting and comparison behavior consistent with applications running on a computer with the corresponding Windows operating system locale.  Windows collation also provide consistent behavior for both Unicode and non-Unicode data types. 

SQL collations use different rules for non-Unicode and Unicode types.  SQL Server collations, identified with the SQL_ collation name prefix, use the character set and sort order settings from older SQL Server versions for non-Unicode types and are provided specifically to maintain compatibility with existing SQL Server installations.  Both SQL and Windows collations use the same rules for Unicode types.

Specifying a Collation

Collation can be specified at the instance, database, column and expression level.  The SQL Server instance collation is determined during SQL Server installation and cannot be changed without a reinstall/rebuild.  It's a good idea to get the collation right the first time unless you need practice re-installing SQL Server.  Keep in mind that the instance collation determines the collation (including case-sensitivity) of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables.  Passwords are always case-sensitive in SQL 2005 and above, although collation determined password sensitivity in earlier versions.

The database collation is determined when the database is created.  If not specified otherwise, the instance default collation is used as the database collation.  Database-level identifiers like table and column names use the database collation as do literal expressions.  The database collation can be changed at any time but this does not change the collation of existing table columns.

Column collation for character data is specified when the table is created or when the column added to the table.  If not specified otherwise, the database collation is used.  A column's collation can be changed only by altering the column with the new collation or recreating the table with the new collation specified on the column definition.  If you want a column's collation to remain different than the database default collation, you must be careful to explicitly specify the collation whenever the column is altered so that it not inadvertently changed to the database default collation.

Choosing the Right Collation

The default collation that the SQL Server installer chooses is not necessarily the Microsoft recommended one or the one that is best for your environment.  SQL Server setup examines the operating system locale and chooses the default as the oldest available version associated with the locale.  For example, a SQL Server installation in the US will default to SQL_Latin1_General_CP1_CI_AS and the installation default in the UK will be Latin1_General_CI_AS.  In both cases, Microsoft recommends a Windows collation (e.g. Windows Latin1_General_CI_AS) unless one needs to maintain compatibility with existing installations.  More on that shortly.

Language is the most important consideration in choosing a collation for a new installation.  This is one reason why the SQL Server installer chooses the default collation based on the operating system locale.  If all users speak the same language, choose a collation that supports the language/locale.  This will help ensure expected sorting and comparison behavior along with alphabet support for non-Unicode types.  In a multi-language environment, choose a collation with the best overall support for the languages used.

Another major consideration is collation compatibility.  If you have existing SQL installations, consider using the same collation for a new instance if you envision sharing data via replication, SSIS or future server consolidation.  I previously mentioned that Microsoft recommends a Windows collation but it may be better to revert to a SQL collation for compatibility with older instances in your environment that already use the SQL collation.  Compatibility is another reason why the installation default is SQL_Latin1_General_CP1_CI_AS collation in the US.  Unfortunately, this default has the side effect of DBAs unwittingly installing new instances with a SQL collation instead of a Windows collation like Latin1_General_CI_AS even when compatibility isn't needed.

The choice of whether or not to choose a case sensitive collation is a bit subjective.  A case insensitive collation is appropriate when you need to query data regardless of the case of the actual data.  For example, this allows one to easily find customers with a last name of 'Smith' even when data is not stored in proper case.  With a case sensitive collation, it is important that one stores data in a consistent case (not to say that one shouldn't anyway) and this places more burden on application and database developers. 

Collation Performance

Collation performance was a bigger deal back in the days of 486 processors (instead of collation, it was actually character set and sort order back then).  The comparative performance on modern processors is usually insignificant.  SQL collations should provide better performance than Windows collations for non-Unicode types due to simpler comparison rules but the difference is significant only in the most severe circumstances, such as a table scan with LIKE '%Some String%' in the WHERE clause.  See Comparing SQL collations to Windows collations.  Binary collations are said to provide the best performance but the cost of unnatural (non-dictionary) comparisons and sort order is high; most users would expect 'a' to sort before 'B' but that is not the case with binary collations.

I personally don't think performance should even be considered in choosing the proper collation.  One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems.  With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations.  The real key to performance is query and index tuning rather than collation.  If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.

Summary

My general recommendation is that you should use a case insensitive Windows collation appropriate for your locale unless you need to maintain compatibility with existing SQL instances or have special considerations.  In my next post, I'll discuss changing collations so that you can avoid a mixed collation environment and show different methods to accomplish the task.

Forced Parameterization: A Turbo Button?

I never had the need to turn on the PARAMETERIZATION FORCED database option until this week.  We pretty much use only stored procedures for our internal applications so the execution plans are almost always in cache and reused.  This practice of using parameterized stored procedure calls, together with attention to detail in query and index tuning, allows us to comfortably handle several thousand requests per second on commodity hardware without taking special measures.

The Perfect Storm

We acquired a third-party application which had to sustain thousands of batch requests per second in order to keep up with our peak demand.  Our first attempt to use the application out-of-the box failed miserably when the 16-core database server quickly hit 100% CPU and stayed there.  An examination of the most frequently run query soon revealed why CPU was so high.  Not only was the moderately complex query not parameterized, each invocation required a full table scan.  The schema (EAV model, missing primary keys and indexes), application code (ad-hoc, non-parameterized queries) and inattention to indexing seemed the perfect storm to guarantee failure. 

Our hands were tied in what the vendor could/would do to address our performance concerns.  We worked with the vendor to optimize indexes and this brought the CPU down to about 65% but the batch requests/sec rate and slow response time was still unacceptable.   We needed to increase performance by at least an order of magnitude to meet SLAs.

The Perfect Fix

I then recalled an experience that SQL Server MVP Adam Machanic shared not long ago:

CPU was 95%+ at peak time (several thousand batch requests/second, via an ASP (classic) front end), and the peak time lasted 8+ hours every day.  The server was one of the big HP boxes -- not sure if it was a Superdome or some other model -- with something like 56 cores and 384 GB of RAM.  The database itself was only 40 or 50 GB, as I recall, so the entire thing was cached.  Long story short, I logged in during peak load, did a quick trace and noticed right away that none of the queries were parameterized.  I decided to throw caution to the wind and just go for it.  Flipped the thing into Forced Parameterization mode and held my breath as I watched the CPU counters *instantly* drop to 7% and stay there. I thought I'd broken the thing, but after checking my trace queries were running through the system same as before, and with the same number of errors (another story entirely <g>). Luckily the head IT guy happened to be watching his dashboard right as I made the change, and after seeing such an extreme result thought I was a god...

 

I knew of PARAMETERIZATION FORCED but never realized how big a difference the option could make until I learned of Adam's experience.  I'm not quite as adventuresome as he is so I restored the production database to a separate environment for some cursory testing.  To my amazement, I watched the rate of my single-threaded test jump from a few dozen batch requests/sec to several hundred immediately after I executed "ALTER DATABASE...SET PARAMETERIZATION FORCED".  CPU dropped by half even with the tenfold increase in throughput. 

The production improvement was even more impressive - the 16 core Dell R900 hasn't exceeded 8% CPU since the change.  Response time is excellent, we have happy users and plenty of CPU headroom to spare.

A Turbo Button?

Despite anecdotal success with PARAMETERIZATION FORCED, I wouldn't turn it on indiscriminately.  When the PARAMETERIZATION FORCED database option is on, all queries are parameterized, including complex ones.  This is good in that compilation costs are avoided due to cache hits.  The bad news is that a single plan might not be appropriate for all possible values of a given query.  Worse overall performance will result when higher execution costs (due to sub-optimal plans) exceed compilation savings so you should understand the query mix before considering the option.

In contrast, SQL Server parameterizes only relatively simple "no brainer" queries in the default PARAMETERIZATION SIMPLE mode.  This behavior promotes reuse of plans for queries that will yield the same plan anyway regardless of the literal values in the query.  Complex queries are not parameterized automatically so that the optimizer can generate the optimal plan for the values of the current query in the event of a cache miss.  The downside with simple parameterization, as Adam and I observed, is that complex queries not already in cache will incur costly compilation costs that are a CPU hog in a high-volume OLTP workload.

There is also middle ground between PARAMETERIZATION SIMPLE and PARAMETERIZATION FORCED.  One can use plans guides with PARAMETERIZATION SIMPLE to avoid compilation for selected queries while other complex queries are compiled as normal.  In my case, a plan guide may have been a better option because the culprit was a single query rather than many different unpredictable ones.

In my opinion, the best solution is to use stored procedures and/or parameterized queries in the first place.  These methods provide the performance benefits of PARAMETERIZATION FORCED and add other security and application development benefits.  Unfortunately, third-party vendors are notorious for not following parameterization Best Practices so DBAs need to keep PARAMETERIZATION FORCED and plan guides in their tool belt.

 

Restore Database Stored Procedure

A user in the SQL Server public newsgroups asked about how to restore a database with many files and rename during the process:

I am restoring a database onto another server with different drive
sizes and mappings.
The thing is, I have over 100 catalogs to restore. I don't want to
have to define each catalog name and its new location Like below:

RESTORE DATABASE Northwinds
FROM DISK = 'C:\db.bak'
WITH MOVE 'Catalog1' TO 'D:\Catalog1'
WITH MOVE 'Catalog2' TO 'D:\Catalog2
WITH MOVE 'Catalog3' TO 'D:\Catalog3'
WITH MOVE 'Catalog4' TO 'D:\Catalog4
WITH MOVE 'Catalog5' TO 'D:\Catalog5'
WITH MOVE 'Catalog6' TO 'D:\Catalog6'
...WITH MOVE 'Catalog100' TO 'D:\Catalog100'

Is it possible to restore the catalgos using a wilcard as such?

RESTORE DATABASE Northwinds
FROM DISK = 'C:\db.bak'
WITH MOVE 'Catalog%' TO 'D:\Catalog%'

 

This reminded me of a stored procedure I wrote several years ago for SQL Server 2000 that would be perfect for such a task.  The proc generates and optionally executes the necessary RESTORE and ALTER commands to make quick work of what is otherwise a long and tedious process if you have many files and databases.  I updated my old proc for SQL Server 2008 and thought I'd share it here.    Below is the proc with documentation and samples in the comments.  I hope you find this useful.

IF OBJECT_ID(N'tempdb..#RestoreDatabase_SQL2008') IS NOT NULL

      DROP PROCEDURE #RestoreDatabase_SQL2008

GO

 

CREATE PROCEDURE #RestoreDatabase_SQL2008

      @BackupFile nvarchar(260),

      @NewDatabaseName sysname = NULL,

      @FileNumber int = 1,

      @DataFolder nvarchar(260) = NULL,

      @LogFolder nvarchar(260) = NULL,

      @ExecuteRestoreImmediately char(1) = 'N',

      @ChangePhysicalFileNames char(1) = 'Y',

      @ChangeLogicalNames char(1) = 'Y',

      @DatabaseOwner sysname = NULL,

      @AdditionalOptions nvarchar(500) = NULL

AS

 

/*

 

This procedure will generate and optionally execute a RESTORE DATABASE

script from the specified disk database backup file.

 

Parameters:

 

      @BackupFile: Required. Specifies fully-qualified path to the disk

            backup file. For remote (network) files, UNC path should

            be specified.  The SQL Server service account will need

            permissions to the file.

 

      @NewDatabaseName: Optional. Specifies the target database name

            for the restore.  If not specified, the database is

            restored using the original database name.

 

      @FileNumber: Optional. Specifies the file number of the desired

            backup set. This is needed only when when the backup file

            contains multiple backup sets. If not specified, a

            default of 1 is used.

 

      @DataFolder: Optional. Specifies the folder for all database data

            files. If not specified, data files are restored using the

            original file names and locations.

 

      @LogFolder: Optional. Specifies the folder for all database log

            files. If not specified, log files are restored to the

            original log file locations.

 

      @ExecuteRestoreImmediately: Optional. Specifies whether or not to

            execute the restore. When, 'Y' is specified, then restore is

            executed immediately.  When 'Y' is specified, the restore script

            is printed but not executed. If not specified, a default of 'N'

            is used.

           

      @ChangePhysicalFileNames: Optional. Indicates that physical file

            names are to be renamed during the restore to match the

            new database name. When 'Y' is specified, the leftmost

            part of the original file name matching the original

            database name is replaced with the new database name. The

            file name is not changed when 'N' is specified or if the

            leftmost part of the file name doesn't match the original

            database name. If not specified, a default of 'Y' is used.

 

      @ChangeLogicalNames: Optional. Indicates that logical file names

            are to be renamed following the restore to match the new

            database name. When 'Y' is specified, the leftmost part

            of the original file name matching the original database

            name is replaced with the new database name. The file name

            is not changed when 'N' is specified or if the leftmost

            part of the file name doesn't match the original database

            name. If not specified, a default of 'Y' is used.

           

      @DatabaseOwner: Optional. Specifies the new database owner

            (authorization) of the restored database.  If not specified, the

            database will be owned by the accunt used to restore the database.

           

      @AdditionalOptions:  Optional.  Specifies options to be added the the

            RESTORE statement WITH clause (e.g. STATS=5, REPLACE).  If not

            specified, only the FILE and MOVE are included.

 

Sample usages:

 

      --restore database with same name and file locations

      EXEC #RestoreDatabase_SQL2008

            @BackupFile = N'C:\Backups\Foo.bak',

            @AdditionalOptions=N'STATS=5, REPLACE';

           

      Results:

      --Backup source: ServerName=MYSERVER, DatabaseName=Foo, BackupFinishDate=2009-06-13 11:20:52.000

      RESTORE DATABASE [MyDatabase]

            FROM DISK=N'C:\Backups\Foo.bak'

            WITH

                  FILE=1, STATS=5, REPLACE

 

      --restore database with new name and change logical and physical names

      EXEC #RestoreDatabase_SQL2008

            @BackupFile = N'C:\Backups\Foo.bak',

            @NewDatabaseName = 'Foo2';

           

      Results:

      --Backup source: ServerName=MYSERVER, DatabaseName=Foo, BackupFinishDate=2009-06-13 11:20:52.000

      RESTORE DATABASE [Foo2]

            FROM DISK=N'C:\Backups\Foo.bak'

            WITH

                  FILE=1,

                        MOVE 'Foo' TO 'C:\DataFolder\Foo2.mdf',

                        MOVE 'Foo_log' TO 'D:\LogFolder\Foo2_log.LDF'

      ALTER DATABASE [Foo2]

                        MODIFY FILE (NAME='Foo', NEWNAME='Foo2');

      ALTER DATABASE [Foo2]

                        MODIFY FILE (NAME='Foo_log', NEWNAME='Foo2_log');

                       

      --restore database to different file folders and change owner after restore:

      EXEC #RestoreDatabase_SQL2008

            @BackupFile = N'C:\Backups\Foo.bak',

            @DataFolder = N'E:\DataFiles',

            @LogFolder = N'F:\LogFiles',

            @DatabaseOwner = 'sa',

            @AdditionalOptions=N'STATS=5;

           

      Results:

      --Backup source: ServerName=MYSERVER, DatabaseName=Foo, BackupFinishDate=2009-06-13 11:20:52.000

      RESTORE DATABASE [Foo]

            FROM DISK=N'C:\Backups\Foo.bak'

            WITH

                  FILE=1,

                        MOVE 'Foo' TO 'E:\DataFiles\Foo.mdf',

                        MOVE 'Foo_log' TO 'F:\LogFiles\Foo_log.LDF'

      ALTER AUTHORIZATION ON DATABASE::[Foo] TO [sa]

*/

 

SET NOCOUNT ON;

 

DECLARE @LogicalName nvarchar(128),

      @PhysicalName nvarchar(260),

      @PhysicalFolderName nvarchar(260),

      @PhysicalFileName nvarchar(260),

      @NewPhysicalName nvarchar(260),

      @NewLogicalName nvarchar(128),

      @OldDatabaseName nvarchar(128),

      @RestoreStatement nvarchar(MAX),

      @Command nvarchar(MAX),

      @ReturnCode int,

      @FileType char(1),

      @ServerName nvarchar(128),

      @BackupFinishDate datetime,

      @Message nvarchar(4000),

      @ChangeLogicalNamesSql nvarchar(MAX),

      @AlterAuthorizationSql nvarchar(MAX),

      @Error int;

 

DECLARE @BackupHeader TABLE

      (

      BackupName nvarchar(128) NULL,

      BackupDescription  nvarchar(255) NULL,

      BackupType smallint NULL,

      ExpirationDate datetime NULL,

      Compressed tinyint NULL,

      Position smallint NULL,

      DeviceType tinyint NULL,

      UserName nvarchar(128) NULL,

      ServerName nvarchar(128) NULL,

      DatabaseName nvarchar(128) NULL,

      DatabaseVersion int NULL,

      DatabaseCreationDate  datetime NULL,

      BackupSize numeric(20,0) NULL,

      FirstLSN numeric(25,0) NULL,

      LastLSN numeric(25,0) NULL,

      CheckpointLSN  numeric(25,0) NULL,

      DatabaseBackupLSN  numeric(25,0) NULL,

      BackupStartDate  datetime NULL,

      BackupFinishDate  datetime NULL,

      SortOrder smallint NULL,

      CodePage smallint NULL,

      UnicodeLocaleId int NULL,

      UnicodeComparisonStyle int NULL,

      CompatibilityLevel  tinyint NULL,

      SoftwareVendorId int NULL,

      SoftwareVersionMajor int NULL,

      SoftwareVersionMinor int NULL,

      SoftwareVersionBuild int NULL,

      MachineName nvarchar(128) NULL,

      Flags int NULL,

      BindingID uniqueidentifier NULL,

      RecoveryForkID uniqueidentifier NULL,

      Collation nvarchar(128) NULL,

      FamilyGUID uniqueidentifier NULL,

      HasBulkLoggedData bit NULL,

      IsSnapshot bit NULL,

      IsReadOnly bit NULL,

      IsSingleUser bit NULL,

      HasBackupChecksums bit NULL,

      IsDamaged bit NULL,

      BeginsLogChain bit NULL,

      HasIncompleteMetaData bit NULL,

      IsForceOffline bit NULL,

      IsCopyOnly bit NULL,

      FirstRecoveryForkID uniqueidentifier NULL,

      ForkPointLSN decimal(25, 0) NULL,

      RecoveryModel nvarchar(60) NULL,

      DifferentialBaseLSN decimal(25, 0) NULL,

      DifferentialBaseGUID uniqueidentifier NULL,

      BackupTypeDescription  nvarchar(60) NULL,

      BackupSetGUID uniqueidentifier NULL,

      CompressedBackupSize binary(8) NULL

);

 

DECLARE @FileList TABLE

      (

      LogicalName nvarchar(128) NOT NULL,

      PhysicalName nvarchar(260) NOT NULL,

      Type char(1) NOT NULL,

      FileGroupName nvarchar(120) NULL,

      Size numeric(20, 0) NOT NULL,

      MaxSize numeric(20, 0) NOT NULL,

      FileID bigint NULL,

      CreateLSN numeric(25,0) NULL,

      DropLSN numeric(25,0) NULL,

      UniqueID uniqueidentifier NULL,

      ReadOnlyLSN numeric(25,0) NULL ,

      ReadWriteLSN numeric(25,0) NULL,

      BackupSizeInBytes bigint NULL,

      SourceBlockSize int NULL,

      FileGroupID int NULL,

      LogGroupGUID uniqueidentifier NULL,

      DifferentialBaseLSN numeric(25,0)NULL,

      DifferentialBaseGUID uniqueidentifier NULL,

      IsReadOnly bit NULL,

      IsPresent bit NULL,

      TDEThumbprint varbinary(32) NULL

 );

 

SET @Error = 0;

 

--add trailing backslash to folder names if not already specified

IF LEFT(REVERSE(@DataFolder), 1) <> '\' SET @DataFolder = @DataFolder + '\';

IF LEFT(REVERSE(@LogFolder), 1) <> '\' SET @LogFolder = @LogFolder + '\';

 

-- get backup header info and display

SET @RestoreStatement = N'RESTORE HEADERONLY

      FROM DISK=N''' + @BackupFile + ''' WITH FILE=' + CAST(@FileNumber as nvarchar(10));

INSERT INTO @BackupHeader

      EXEC('RESTORE HEADERONLY FROM DISK=N''' + @BackupFile + ''' WITH FILE = 1');

SET @Error = @@ERROR;

IF @Error <> 0 GOTO Done;

IF NOT EXISTS(SELECT * FROM @BackupHeader) GOTO Done;

SELECT

      @OldDatabaseName = DatabaseName,

      @ServerName = ServerName,

      @BackupFinishDate = BackupFinishDate

FROM @BackupHeader;

IF @NewDatabaseName IS NULL SET @NewDatabaseName = @OldDatabaseName;

SET @Message = N'--Backup source: ServerName=%s, DatabaseName=%s, BackupFinishDate=' +

      CONVERT(nvarchar(23), @BackupFinishDate, 121);

RAISERROR(@Message, 0, 1, @ServerName, @OldDatabaseName) WITH NOWAIT;

 

-- get filelist info

SET @RestoreStatement = N'RESTORE FILELISTONLY

      FROM DISK=N''' + @BackupFile + ''' WITH FILE=' + CAST(@FileNumber as nvarchar(10));

INSERT INTO @FileList

      EXEC(@RestoreStatement);

SET @Error = @@ERROR;

IF @Error <> 0 GOTO Done;

IF NOT EXISTS(SELECT * FROM @FileList) GOTO Done;

 

-- generate RESTORE DATABASE statement and ALTER DATABASE statements

SET @ChangeLogicalNamesSql = '';

SET @RestoreStatement =

      N'RESTORE DATABASE ' +

      QUOTENAME(@NewDatabaseName) +

      N'

      FROM DISK=N''' +

      @BackupFile + '''' +

      N'

      WITH

            FILE=' +

      CAST(@FileNumber as nvarchar(10))

DECLARE FileList CURSOR LOCAL STATIC READ_ONLY FOR

      SELECT

            Type AS FileTyoe,

            LogicalName,

            --extract folder name from full path

            LEFT(PhysicalName,

                  LEN(LTRIM(RTRIM(PhysicalName))) -

                  CHARINDEX('\',

                  REVERSE(LTRIM(RTRIM(PhysicalName)))) + 1)

                  AS PhysicalFolderName,

            --extract file name from full path

            LTRIM(RTRIM(RIGHT(PhysicalName,

                  CHARINDEX('\',

                  REVERSE(PhysicalName)) - 1))) AS PhysicalFileName

FROM @FileList;

 

OPEN FileList;

 

WHILE 1 = 1

BEGIN

      FETCH NEXT FROM FileList INTO

            @FileType, @LogicalName, @PhysicalFolderName, @PhysicalFileName;

      IF @@FETCH_STATUS = -1 BREAK;

 

      -- build new physical name

      SET @NewPhysicalName =

            CASE @FileType

                  WHEN 'D' THEN

                        COALESCE(@DataFolder, @PhysicalFolderName) +

                        CASE

                              WHEN UPPER(@ChangePhysicalFileNames) IN ('Y', '1') AND

                                    LEFT(@PhysicalFileName, LEN(@OldDatabaseName)) = @OldDatabaseName

                              THEN

                                    @NewDatabaseName + RIGHT(@PhysicalFileName, LEN(@PhysicalFileName) - LEN(@OldDatabaseName))

                              ELSE

                                    @PhysicalFileName

                        END

                  WHEN 'L' THEN

                        COALESCE(@LogFolder, @PhysicalFolderName) +

                        CASE

                              WHEN UPPER(@ChangePhysicalFileNames) IN ('Y', '1') AND

                                    LEFT(@PhysicalFileName, LEN(@OldDatabaseName)) = @OldDatabaseName

                              THEN

                                    @NewDatabaseName + RIGHT(@PhysicalFileName, LEN(@PhysicalFileName) - LEN(@OldDatabaseName))

                              ELSE

                                    @PhysicalFileName

                        END

            END;

 

      -- build new logical name

      SET @NewLogicalName =

            CASE

                  WHEN UPPER(@ChangeLogicalNames) IN ('Y', '1') AND

                        LEFT(@LogicalName, LEN(@OldDatabaseName)) = @OldDatabaseName

                        THEN

                              @NewDatabaseName + RIGHT(@LogicalName, LEN(@LogicalName) - LEN(@OldDatabaseName))

                        ELSE

                              @LogicalName

            END;

           

      -- generate ALTER DATABASE...MODIFY FILE statement if logical file name is different

      IF @NewLogicalName <> @LogicalName

            SET @ChangeLogicalNamesSql = @ChangeLogicalNamesSql + N'ALTER DATABASE ' + QUOTENAME(@NewDatabaseName) + N'

                  MODIFY FILE (NAME=''' + @LogicalName + N''', NEWNAME=''' + @NewLogicalName + N''');

'

 

      -- add MOVE option as needed if folder and/or file names are changed

      IF @PhysicalFolderName + @PhysicalFileName <> @NewPhysicalName

      BEGIN

            SET @RestoreStatement = @RestoreStatement +

                  N',

                  MOVE ''' +

                  @LogicalName +

                  N''' TO ''' +

                  @NewPhysicalName +

                  N'''';

      END;

 

END;

CLOSE FileList;

DEALLOCATE FileList;

 

IF @AdditionalOptions IS NOT NULL

      SET @RestoreStatement =

            @RestoreStatement + N', ' + @AdditionalOptions

           

IF @DatabaseOwner IS NOT NULL

      SET @AlterAuthorizationSql = N'ALTER AUTHORIZATION ON DATABASE::' +

            QUOTENAME(@NewDatabaseName) + N' TO ' + QUOTENAME(@DatabaseOwner)

ELSE

      SET @AlterAuthorizationSql = N''

--execute RESTORE statement

IF UPPER(@ExecuteRestoreImmediately) IN ('Y', '1')

BEGIN

 

      RAISERROR(N'Executing:

%s', 0, 1, @RestoreStatement) WITH NOWAIT

      EXEC (@RestoreStatement);

      SET @Error = @@ERROR;

      IF @Error <> 0 GOTO Done;

 

      --execute ALTER DATABASE statement(s)

      IF @ChangeLogicalNamesSql <> ''

      BEGIN

            RAISERROR(N'Executing:

%s', 0, 1, @ChangeLogicalNamesSql) WITH NOWAIT

            EXEC (@ChangeLogicalNamesSql);

            SET @Error = @@ERROR;

            IF @Error <> 0 GOTO Done;

      END

     

      IF @AlterAuthorizationSql <> ''

      BEGIN

            RAISERROR(N'Executing:

%s', 0, 1, @AlterAuthorizationSql) WITH NOWAIT

            EXEC (@AlterAuthorizationSql);

            SET @Error = @@ERROR;

            IF @Error <> 0 GOTO Done;

      END

     

END

ELSE

BEGIN

      RAISERROR(N'%s', 0, 1, @RestoreStatement) WITH NOWAIT

      IF @ChangeLogicalNamesSql <> ''

      BEGIN

            RAISERROR(N'%s', 0, 1, @ChangeLogicalNamesSql) WITH NOWAIT;

      END

      IF @AlterAuthorizationSql <> ''

      BEGIN

            RAISERROR(N'%s', 0, 1, @AlterAuthorizationSql) WITH NOWAIT;

      END

END;

 

Done:

 

RETURN @Error;

GO

 

 

Database Mail Configuration

I recently had to setup Database Mail on dozens of SQL Server instances.   Rather than perform this tedious task using the SSMS GUI, I developed a script that saved me a lot of time which I'm sharing here.   

My needs were simple so I only needed a single SMTP account and profile.  I decided to make the profile the default public one so that all msdb users would use this profile unless a different sp_send_dbmail @profile value was explicitly specified.  You might want to extend this script if you need other accounts/profiles, such as separate ones for administrative alerts or user reports.

Setup Script

Below is the template script I used for my task.  The sysmail_add_account_sp @username and @password parameters might be required depending on your SMTP server authentication and you will of course need to customize the mail server name and addresses for your environment.

-- Enable Database Mail for this instance

EXECUTE sp_configure 'show advanced', 1;

RECONFIGURE;

EXECUTE sp_configure 'Database Mail XPs',1;

RECONFIGURE;

GO

 

-- Create a Database Mail account

EXECUTE msdb.dbo.sysmail_add_account_sp

    @account_name = 'Primary Account',

    @description = 'Account used by all mail profiles.',

    @email_address = 'myaddress@mydomain.com',

    @replyto_address = 'myaddress@mydomain.com',

    @display_name = 'Database Mail',

    @mailserver_name = 'mail.mydomain.com';

 

-- Create a Database Mail profile

EXECUTE msdb.dbo.sysmail_add_profile_sp

    @profile_name = 'Default Public Profile',

    @description = 'Default public profile for all users';

 

-- Add the account to the profile

EXECUTE msdb.dbo.sysmail_add_profileaccount_sp

    @profile_name = 'Default Public Profile',

    @account_name = 'Primary Account',

    @sequence_number = 1;

 

-- Grant access to the profile to all msdb database users

EXECUTE msdb.dbo.sysmail_add_principalprofile_sp

    @profile_name = 'Default Public Profile',

    @principal_name = 'public',

    @is_default = 1;

GO

 

--send a test email

EXECUTE msdb.dbo.sp_send_dbmail

    @subject = 'Test Database Mail Message',

    @recipients = 'testaddress@mydomain.com',

    @query = 'SELECT @@SERVERNAME';

GO

Not Before Service Pack 1

In case you haven't yet heard, Microsoft SQL Server 2008 service pack 1 was released on April 7.  This milestone is especially significant for those of you who could not previously deploy the latest SQL Server release because your organization has a "not before the first service pack" policy.  I want to go on record as one who believes that such a policy is flawed and has needlessly delayed many organizations from using the new SQL Server 2008 features.

There is nothing magical about the first service pack compared to the initial RTM release with regards to production readiness.  SQL Server releases nowadays are scheduled based quality rather than just hitting a date.  Buggy features will be dropped from a release rather than included and in need of a service pack.  I'm not saying that every SQL Server release is flawless but the number of serious bugs (e.g. corruption or wrong results) are few, thanks to internal testing by Microsoft as well as those in the community that kick the tires with the pre-release CTP bits.

It's understandable that those who are risk-adverse might wait until after the first service pack with the belief that other adopters may have smoothed out the bumps in the road a bit.  I can see how postponing installation in this way might mitigate some of the risk but SP1 is a completely arbitrary milestone that is a hold-over from before SQL 7 was released over a decade ago.  I think a better approach is to adopt new releases based on quality as determined in one's own environment.  Whether the target is a new SQL Server installation or an upgrade of an existing instance, one still needs to perform testing before installing any new version, service pack or patch in production.  It is those test results that should determine production readiness, not the results of SELECT SERVERPROPERTY('ProductLevel').

QUOTED_IDENTIFIERS and ANSI_NULLS ON

I suggest that one always turn on both the QUOTED_IDENTIFIERS and ANSI_NULLS session settings.  Not only do these settings provide ANSI-standard behavior, these must be turned on in order to use features like indexed views, indexes on computed columns and query notifications.  It is tricky to ensure the settings are as desired, though, because the default session settings are different depending on the tools you use.

DDL Script Considerations

It is especially important to ensure the QUOTED_IDENTIFIERS and ANSI_NULLS session settings are correct with DDL scripts because both QUOTED_IDENTIFIERS and ANSI_NULL are "sticky".  The settings in effect when a stored procedure, view, function or trigger are created are also used at execution time.  The create time settings override run-time session settings. 

SQLCMD and OSQL Turn Settings Off

QUOTED_IDENTIFIERS and ANSI_NULLS are on by default when you connect using modern client APIs like ODBC, SQLOLEDB, SQL Native Client and SqlClient.  The SQL Server Management Studio and Query Analyzer tools keep those settings on unless you override the connection behavior under the tool connection options or run SET QUOTED_IDENTIFIERS ON or SET ANSI_NULLS ON commands in the query window.

The SQLCMD and OSQL command prompt utilities are different, tough.  These tools explicitly turn off QUOTED_IDENTIFIERS after connecting, presumably to provide backwards compatibility.  One must either specify the “-I” (upper-case “eye”) command-line argument to turn on QUOTED_IDENTIFIERS or include a SET QUOTED_IDENTIFIERS ON command in all the SQL scripts run from those utilities.  I personally like avoid SET commands in my DDL scripts so I make it a habit to specify the -I command line option.

“UPSERT” Race Condition With MERGE

I mentioned in Conditional INSERT/UPDATE Race Condition that most “UPSERT” code is defective and can lead to constraint violations and data integrity issues in a multi-user environment .  In this post, I’ll show how to prevent duplicate key errors and data problems with the MERGE statement too.  You might want to peruse Conditional INSERT/UPDATE Race Condition before reading this for a background on these concurrency concerns.

Background on MERGE

Microsoft introduced the ANSI-standard MERGE statement in SQL Server 2008.  MERGE is very powerful in that it can perform multiple actions in a single statement that previously required separate INSERT/UPDATE/DELETE statements.  MERGE is also a good alternative to the proprietary UPDATE…FROM syntax allowed in the T-SQL dialect.

MERGE can (and in my opinion should) be used to address the requirement to either INSERT or UPDATE depending on whether the source data already exists.  One need only include the MERGE statement clauses WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT in order to take the proper action, all within a single statement.

 “UPSERT” MERGE Concurrency Test

Even though MERGE provides the means to perform multiple actions within a single statement, developers still need to consider concurrency with MERGE to prevent errors and data issues.  Let me illustrate using the table and stored procedure that I originally posted in Conditional INSERT/UPDATE Race Condition:

CREATE TABLE dbo.Foo

(

      ID int NOT NULL

            CONSTRAINT PK_Foo PRIMARY KEY,

      Bar int NOT NULL

);

GO

 

CREATE PROCEDURE dbo.Merge_Foo

      @ID int,

      @Bar int

AS

 

SET NOCOUNT, XACT_ABORT ON;

 

MERGE dbo.Foo AS f

USING (SELECT @ID AS ID, @Bar AS Bar) AS new_foo

ON f.ID = new_foo.ID

WHEN MATCHED THEN

    UPDATE SET f.Bar = new_foo.Bar

WHEN NOT MATCHED THEN

    INSERT (ID, Bar)

        VALUES (new_foo.ID, new_foo.Bar);

       

RETURN @@ERROR;

GO

I ran the script below from 2 different SSMS windows after changing the time to the near future so that both executed at the same time.  My test box had a single quad-core processor with SQL Server 2008 Developer Edition installed, which I expected to have enough multi-processing power to create the error.

WAITFOR TIME '08:00:00'

 

EXEC dbo.Merge_Foo

      @ID = 1,

      @Bar = 1

I got a primary key violation error, showing that MERGE is vulnerable to concurrency problems like a multi-statement conditional INSERT/UPDATE technique. However, I couldn’t reproduce the error with MERGE nearly as consistently as I could with the conditional INSERT/UPDATE in Conditional INSERT/UPDATE Race Condition.  This could be due to a number of reasons (e.g. faster processor, different SQL Server version, MERGE locking behavior) but I wanted to make sure I could reproduce the error reliably.  I created a more robust test to exercise MERGE on a loop:

CREATE TABLE dbo.Foo2

(

      ID int NOT NULL

            CONSTRAINT PK_Foo2 PRIMARY KEY,

      InsertSpid int NOT NULL,

      InsertTime datetime2 NOT NULL,

      UpdateSpid int NULL,

      UpdateTime datetime2 NULL

);

 

CREATE PROCEDURE dbo.Merge_Foo2

      @ID int

AS

 

SET NOCOUNT, XACT_ABORT ON;

 

MERGE dbo.Foo2 AS f

USING (SELECT @ID AS ID) AS new_foo

      ON f.ID = new_foo.ID

WHEN MATCHED THEN

    UPDATE

            SET f.UpdateSpid = @@SPID,

            UpdateTime = SYSDATETIME()

WHEN NOT MATCHED THEN

    INSERT

      (

            ID,

            InsertSpid,

            InsertTime

      )

    VALUES

      (

            new_foo.ID,

            @@SPID,

            SYSDATETIME()

      );

       

RETURN @@ERROR;

I ran the script below from 4 different SSMS windows after changing the time to the near future so that all executed at the same time.   

DECLARE

    @NextTime datetime,

    @ID int,

    @MillisecondDelay int;

SELECT

    @NextTime = '08:10:00',

    @ID = 1,

    @MillisecondDelay = 100;

--execute 10 times per second for 1 minute

WHILE @ID <= 600

BEGIN

    --pause and sync with other sessions

    WAITFOR TIME @NextTime;

    EXEC dbo.Merge_Foo2

        @ID = @ID;

    SELECT

        @ID = @ID + 1,

        --assume no more that 100ms per execution

        @NextTime = DATEADD(MILLISECOND, @MillisecondDelay, @NextTime);

END;

I was able to reproduce the primary key violation every time with this test script.

Addressing the MERGE Race Condition

The underlying issue with any conditional insert technique is that data must be read before the determination can be made whether to INSERT or UPDATE.  To prevent concurrent sessions from inserting data with the same key, an incompatible lock must be acquired to ensure only one session can read the key and that lock must be held until the transaction completes.

I showed how one might address the problem using both UPDLOCK and HOLDLOCK locking hints in Conditional INSERT/UPDATE Race Condition.  MERGE is slightly different, though.  I repeated the test with only the HOLDLOCK hint added:

ALTER PROCEDURE dbo.Merge_Foo2

      @ID int

AS

 

SET NOCOUNT, XACT_ABORT ON;

 

MERGE dbo.Foo2 WITH (HOLDLOCK) AS f

USING (SELECT @ID AS ID) AS new_foo

      ON f.ID = new_foo.ID

WHEN MATCHED THEN

    UPDATE

            SET f.UpdateSpid = @@SPID,

            UpdateTime = SYSDATETIME()

WHEN NOT MATCHED THEN

    INSERT

      (

            ID,

            InsertSpid,

            InsertTime

      )

    VALUES

      (

            new_foo.ID,

            @@SPID,

            SYSDATETIME()

      );

       

RETURN @@ERROR;

This test showed that simply adding the HOLDLOCK hint prevented the primary key violation error.  Unlike the conditional INSERT/UPDATE in Conditional INSERT/UPDATE Race Condition, MERGE acquired a key update lock by default so UPDLOCK was not needed.  Also, in contrast the multi-statement conditional INSERT/UPDATE technique, no explicit transaction is required because MERGE is an atomic DML statement.  The HOLDLOCK hint was still needed, though, because MERGE otherwise releases the update key lock before the insert.  I gleaned this by examining the locks from a Profiler trace of the MERGE without the HOLDLOCK:

EventClass

TextData

Mode

ObjectID

Type

SP:Starting

EXEC dbo.Merge_Foo2 @ID = 1

1314103722

Lock:Acquired

8 - IX

1330103779

5 - OBJECT

Lock:Acquired

1:173

7 - IU

0

6 - PAGE

Lock:Acquired

(10086470766)

4 - U

0

7 - KEY

Lock:Released

(10086470766)

4 - U

0

7 - KEY

Lock:Released

1:173

7 - IU

0

6 - PAGE

Lock:Acquired

1:173

8 - IX

0

6 - PAGE

Lock:Acquired

(10086470766)

15 - RangeI-N

0

7 - KEY

Lock:Acquired

(10086470766)

5 - X

0

7 - KEY

Lock:Released

(10086470766)

5 - X

0

7 - KEY

Lock:Released

1:173

8 - IX

0

6 - PAGE

Lock:Released

8 - IX

1330103779

5 - OBJECT

SP:Completed

EXEC dbo.Merge_Foo2 @ID = 1

1314103722

If another concurrent MERGE of the same key occurs after the update lock is released and before the exclusive key lock is acquired, a duplicate key error will result.

The trace below of the MERGE with the HOLDLOCK hint shows that locks aren’t released until the insert (and statement) completes, this avoiding the concurrency problem with MERGE.

EventClass

TextData

Mode

ObjectID

Type

SP:Starting

EXEC dbo.Merge_Foo2 @ID = 1

 

1314103722

 

Lock:Acquired

 

8 - IX

1330103779

5 - OBJECT

Lock:Acquired

1:173

7 - IU

0

6 - PAGE

Lock:Acquired

(10086470766)

4 - U

0

7 - KEY

Lock:Acquired

1:173

8 - IX

0

6 - PAGE

Lock:Acquired

(10086470766)

15 - RangeI-N

0

7 - KEY

Lock:Acquired

(10086470766)

5 - X

0

7 - KEY

Lock:Released

(10086470766)

5 - X

0

7 - KEY

Lock:Released

1:173

8 - IX

0

6 - PAGE

Lock:Released

 

8 - IX

1330103779

5 - OBJECT

SP:Completed

EXEC dbo.Merge_Foo2 @ID = 1

 

1314103722

 

 

Don’t Use sp_attach_db

I’ve used sp_detach_db and sp_attach_db to relocate database files for many years.  I know that sp_attach_db was deprecated in SQL 2005 but, like most DBAs, I’ve continued to use sp_attach_db mostly out of habit.  I want to share with you why I’ve decided to change my ways.

Planned File Relocation

Let’s say you want to move the log file from to a separate drive.  The following script shows how to accomplish in SQL Server 2000 using sp_attach_db.  The only sp_attach_db parameters required are the database name, primary data file path and the log file that was moved from the original location.

EXEC sp_detach_db

      @dbname = N'MyDatabase';

--move log file to E drive manually and attach from new location

EXEC sp_attach_db

      @dbname = N'MyDatabase',

      @filename1 = N'D:\DataFiles\MyDatabase_Data.mdf',

      @filename2 = N'E:\LogFiles\MyDatabase_Log.ldf';

 

The deprecated sp_attach_db procedure still works in SQL Server 2005 and SQL Server 2008 but is not recommended.  Instead, the proper method to relocate files in these later versions is with ALTER DATABASE…MODIFY FILE.  Simply execute an ALTER DATABASE…MODIFY FILE for each moved file and toggle the ONLINE/OFFLINE database state.  The script example below shows how the log file would be moved to a different drive with this method.  This method is described in detail in the Books Online.

 

ALTER DATABASE MyDatabase SET OFFLINE;

--move log file to E drive manually and attach from new location

ALTER DATABASE MyDatabase

      MODIFY FILE (

            NAME='MyDatabase_Log',

            FILENAME='E:\LogFiles\MyDatabase_Log.ldf');

ALTER DATABASE MyDatabase SET ONLINE;

 

Unfortunately, the Books Online doesn’t provide much info as to why ALTER DATABASE…MODIFY FILE and ONLINE/OFFLINE is preferred over detach/attach for planned file relocations.  One explanation is illustrated by an issue I ran into recently that motivated this post.  After using the detach/attach method, we ended up with Service Broker disabled.  This is documented behavior that we simply overlooked and didn’t catch until subsequent application problems were reported.  Since exclusive database access was needed to re-enable Service Broker, we had to close all user database connections and before altering the database ENABLE_BROKER setting and this was a real pain. 

 

This problem wouldn’t have happened had we used the recommended method and toggled the OFFLINE/ONLINE database state because the database settings would have remained unchanged.  I wouldn’t be surprised if there were other gotchas with the detach/attach method.  The bottom line is that there is no reason not to use the ALTER DATABASE…MODIFY FILE and OFFLINE/ONLINE method to move files.

Attaching a Database to Another Server or Instance

Note that the deprecated sp_attach_db stored procedure is basically just a wrapper for CREATE DATABASE…FOR ATTACH.  You can use CREATE DATABASE…FOR ATTACH much like you would sp_attach_db: specify the database name, primary file path (mdf file path) along with file paths that differ from the original locations.  For example:

EXEC sp_detach_db

      @dbname = N’MyDatabase’;

--move database files manually to new server

CREATE DATABASE MyDatabase

      ON(NAME=’MyDatabase_Data’,

            FILENAME='C:\DataFiles\MyDatabase_Data.mdf')

      LOG ON(NAME='MyDatabase_Log',

            FILENAME='C:\LogFiles\MyDatabase_Log.ldf')

      FOR ATTACH

      WITH ENABLE_BROKER;

 

The ENABLE_BROKER option is appropriate if the purpose of the attach is to completely move a SB-enabled database to another instance or in a DR scenario.  When attaching to create a database replica (e.g. copy for testing), the NEW_BROKER option is appropriate. 

Summary

I suggest ALTER DATABASE…MODIFY FILE and OFFLINE/ONLINE for planned file relocation and sp_detach_db/CREATE DATABASE…FOR ATTACH for other scenarios.  In any case, sp_attach_db should be avoided going forward.

SQL Server Partition Details Custom Report

I developed a custom report for SQL Server Management Studio that wraps the partition details and row counts query I previously posted.  Visit the Codeplex project site for details.  You can download just the released RDL file or, if you want to customize the report to your liking, download the full project source code.  Alternatively you can create a new report project of your own and add the RDL file as an existing project item.

I welcome any feedback you might have, either here or via the Codeplex project discussion page.  It’s been a while since I’ve done any Reporting Services development so there is certainly room for improvement.  Additional project team members are also welcome.

Partition Details and Row Counts

You will likely find the following query useful if you work with partitioned objects.  I developed this when I first started using table partitioning in order to verify proper partition boundaries, filegroups and row counts.  Not only does this provide much more information that can be obtained by querying the underlying table with the partition function to get partition numbers, it runs much faster because only catalog views are used.

As-is, the query includes both partitioned and non-partitioned user objects in the context database but you can customize the WHERE clauses as desired.  I think this query would make a perfect source for a SSMS custom report so that it can be easily invoked from the SSMS Object explorer.  That’s on my to-do list.

Query for Partition Details Using Catalog Views

--paritioned table and index details

SELECT

      OBJECT_NAME(p.object_id) AS ObjectName,

      i.name                   AS IndexName,

      p.index_id               AS IndexID,

      ds.name                  AS PartitionScheme,   

      p.partition_number       AS PartitionNumber,

      fg.name                  AS FileGroupName,

      prv_left.value           AS LowerBoundaryValue,

      prv_right.value          AS UpperBoundaryValue,

      CASE pf.boundary_value_on_right

            WHEN 1 THEN 'RIGHT'

            ELSE 'LEFT' END    AS Range,

      p.rows AS Rows

FROM sys.partitions                  AS p

JOIN sys.indexes                     AS i

      ON i.object_id = p.object_id

      AND i.index_id = p.index_id

JOIN sys.data_spaces                 AS ds

      ON ds.data_space_id = i.data_space_id

JOIN sys.partition_schemes           AS ps

      ON ps.data_space_id = ds.data_space_id

JOIN sys.partition_functions         AS pf

      ON pf.function_id = ps.function_id

JOIN sys.destination_data_spaces     AS dds2

      ON dds2.partition_scheme_id = ps.data_space_id 

      AND dds2.destination_id = p.partition_number

JOIN sys.filegroups                  AS fg

      ON fg.data_space_id = dds2.data_space_id

LEFT JOIN sys.partition_range_values AS prv_left

      ON ps.function_id = prv_left.function_id

      AND prv_left.boundary_id = p.partition_number - 1

LEFT JOIN sys.partition_range_values AS prv_right

      ON ps.function_id = prv_right.function_id

      AND prv_right.boundary_id = p.partition_number 

WHERE

      OBJECTPROPERTY(p.object_id, 'ISMSShipped') = 0

UNION ALL

--non-partitioned table/indexes

SELECT

      OBJECT_NAME(p.object_id)    AS ObjectName,

      i.name                      AS IndexName,

      p.index_id                  AS IndexID,

      NULL                        AS PartitionScheme,

      p.partition_number          AS PartitionNumber,

      fg.name                     AS FileGroupName,  

      NULL                        AS LowerBoundaryValue,

      NULL                        AS UpperBoundaryValue,

      NULL                        AS Boundary, 

      p.rows                      AS Rows

FROM sys.partitions     AS p

JOIN sys.indexes        AS i

      ON i.object_id = p.object_id

      AND i.index_id = p.index_id

JOIN sys.data_spaces    AS ds

      ON ds.data_space_id = i.data_space_id

JOIN sys.filegroups           AS fg

      ON fg.data_space_id = i.data_space_id

WHERE

      OBJECTPROPERTY(p.object_id, 'ISMSShipped') = 0

ORDER BY

      ObjectName,

      IndexID,

      PartitionNumber;

 

Here is some sample output.

ObjectName

IndexName

IndexID

PartitionScheme

PartitionNumber

FileGroupName

LowerBoundaryValue

UpperBoundaryValue

Boundary

Rows

SalesTransactions_NonPartitioned

cdx_SalesTransactions_SalesTransactionTime

1

NULL

1

PartitioningDemo_Data

NULL

NULL

NULL

6576000

SalesTransactions_NonPartitioned

idx_SalesTransactions_StoreID_ProductID

2

NULL

1

PartitioningDemo_Index

NULL

NULL

NULL

6576000

SalesTransactions_NonPartitioned

idx_SalesTransactions_CustomerID

3

NULL

1

PartitioningDemo_Index

NULL

NULL

NULL

6576000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

1

PartitioningDemo_Data

NULL

2006-01-01 0:00:00

RIGHT

0

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

2

PartitioningDemo_Data

2006-01-01 0:00:00

2006-02-01 0:00:00

RIGHT

0

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

3

PartitioningDemo_Data

2006-02-01 0:00:00

2006-03-01 0:00:00

RIGHT

0

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

4

PartitioningDemo_Data

2006-03-01 0:00:00

2006-04-01 0:00:00

RIGHT

0

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

5

PartitioningDemo_Data

2006-04-01 0:00:00

2006-05-01 0:00:00

RIGHT

180000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

6

PartitioningDemo_Data

2006-05-01 0:00:00

2006-06-01 0:00:00

RIGHT

186000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

7

PartitioningDemo_Data

2006-06-01 0:00:00

2006-07-01 0:00:00

RIGHT

180000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

8

PartitioningDemo_Data

2006-07-01 0:00:00

2006-08-01 0:00:00

RIGHT

186000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

9

PartitioningDemo_Data

2006-08-01 0:00:00

2006-09-01 0:00:00

RIGHT

186000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

10

PartitioningDemo_Data

2006-09-01 0:00:00

2006-10-01 0:00:00

RIGHT

180000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

11

PartitioningDemo_Data

2006-10-01 0:00:00

2006-11-01 0:00:00

RIGHT

186000

SalesTransactions_Partitioned

cdx_SalesTransactions_SalesTransactionTime

1

PS_SalesTransactions_Data

12

PartitioningDemo_Data

2006-11-01 0:00:00

2006-12-01 0:00:00

RIGHT

180000

 

Automating RANGE RIGHT Sliding Window Maintenance

I posted example scripts to automate RANGE LEFT sliding window maintenance in my last post.  As promised, I am sharing a RANGE RIGHT version in this post.

I personally prefer a RANGE RIGHT partition function when partitioning on a data type that includes time.  RANGE RIGHT allows specification of an exact date boundary instead of the maximum date/time value needed for RANGE LEFT to keep all data for a given date in the same partition.  Another nicety with RANGE RIGHT is that same boundaries can be used in a RANGE RIGHT partition function of any date/time data type.  In contrast, the time component of RANGE LEFT boundary values must be customized for the specific data type as I described in Sliding Window Table Partitioning.

The downside with RANGE RIGHT is that maintaining the sliding window isn’t quite as intuitive as with RANGE LEFT.  Instead of switching out and merging the first partition during purge/archive, one needs to switch out and merge the second partition.  This practice avoids the costly movement of retained data from the removed second partition into the retained first partition.  Both the first and second partitions are empty during the merge so no data need to be move moved.  The first partition is normally empty at all times.

The stored procedure below shows how you can automate a RANGE RIGHT daily sliding window.  The main differences between this version and the RANGE LEFT version I posted in Automating Sliding Window Maintenance are

1)      Removed boundary calculation of maximum time (DATEADD(millisecond, -3, @RunDate)).

2)      Added a conditional create of upper boundary of second partition (oldest period retained in third partition)

3)      Added $PARTITION for SWITCH instead of hard-coding the partition number

Example of RANGE RIGHT Sliding Window Automation

Below are the demo objects used by the range RIGHT sliding window procedure.

--no boundaries initially - proc will create as needed

CREATE PARTITION FUNCTION PF_MyPartitionFunction(datetime)

AS RANGE RIGHT FOR VALUES();

GO

 

CREATE PARTITION SCHEME PS_MyPartitionScheme

AS PARTITION PF_MyPartitionFunction ALL TO ([PRIMARY]);

GO

 

CREATE TABLE dbo.MyPartitionedTable

(

      PartitionColumnDateTime datetime

) ON PS_MyPartitionScheme(PartitionColumnDateTime);

GO

 

--note staging table uses same partition scheme as primary table

CREATE TABLE dbo.MyPartitionedTable_Staging

(

      PartitionColumnDateTime datetime

) ON PS_MyPartitionScheme(PartitionColumnDateTime);

GO

 

The sliding window proc:

CREATE PROC dbo.SlideRangeRightWindow_datetime

        @RetentionDays int,

        @RunDate datetime = NULL

/*

      This proc maintains a RANGE RIGHT daily sliding window

      based on the specified @RetentionDays.  It is intended to

      be scheduled daily shortly after midnight. In addition to

      purging old data, the partition function is adjusted to

      account for scheduling issues or changes in @RetentionDays.

 

      Partitions are split and merged so that the first partition

      boundary is the oldest retained data date and the last

      boundary is the next day.  Other partitions contain current

      and historical data for the specifiednumber of @RetentionDays. 

 

      After successful execution, (at least) the following

      partitions will exist:

      - partition 1 = data older than retained date (empty)

      - other partitions = hitorical data (@RunDate - 1 and earlier)

      - second from last partition = current data (@RunDate)

      - last partition = future data (@RunDate + 1) (empty)     

 

*/

 

AS

 

SET NOCOUNT, XACT_ABORT ON;

 

DECLARE

        @Error int,

        @RowCount bigint,

        @ErrorLine int,

        @Message varchar(255),

        @OldestRetainedDate datetime,

        @PartitionBoundaryDate datetime;

 

SET @Error = 0;

 

BEGIN TRY

 

      IF @RunDate IS NULL

      BEGIN

            --use current date (midnight) if no date specified

            SET @RunDate = DATEADD(day, 0, DATEDIFF(day, '', GETDATE()));

      END

      ELSE

      BEGIN

            --set time to midnight of specified date

            SET @RunDate = DATEADD(day, 0, DATEDIFF(day, '', @RunDate));

      END

     

      --calculate oldest retention date based on @RetentionDays and @RunDate

      SET @OldestRetainedDate = DATEADD(day, @RetentionDays * -1, @RunDate);

 

      SET @Message =

            'Run date = ' +

            + CONVERT(varchar(23), @RunDate, 121)

            + ', Retention days = '

            + CAST(@RetentionDays AS varchar(10))

            + ', Oldest retained data date = '

            + CONVERT(varchar(23), @OldestRetainedDate, 121);

 

      RAISERROR (@Message, 0, 1) WITH NOWAIT;

 

      BEGIN TRAN;

 

      --acquire exclusive table lock to prevent deadlocking

      --with concurrent activity.

      SELECT TOP 1 @error = 0

      FROM dbo.MyPartitionedTable WITH (TABLOCKX, HOLDLOCK);

 

      --make sure we have a boundary for oldest retained period

      IF NOT EXISTS(

            SELECT prv.value

            FROM sys.partition_functions AS pf

            JOIN sys.partition_range_values AS prv ON

                  prv.function_id = pf.function_id

            WHERE

                  pf.name = 'PF_MyPartitionFunction'

                  AND CAST(prv.value AS datetime) = @OldestRetainedDate

            )

      BEGIN

            ALTER PARTITION SCHEME PS_MyPartitionScheme

                    NEXT USED [PRIMARY];

            ALTER PARTITION FUNCTION PF_MyPartitionFunction()

                    SPLIT RANGE(@OldestRetainedDate);

            SET @Message =

                    'Created boundary for oldest retained data ('

                    + CONVERT(varchar(30), @OldestRetainedDate, 121) + ')';

 

            RAISERROR(@Message, 0, 1) WITH NOWAIT;

      END

      ELSE

      BEGIN

            SET @Message =

                    'Oldest retained data boundary already exists ('

                    + CONVERT(varchar(30), @OldestRetainedDate, 121) + ')';

 

            RAISERROR(@Message, 0, 1) WITH NOWAIT;

      END

       

      --get earliest expired boundary

      SET @PartitionBoundaryDate = NULL;

      SELECT

            @PartitionBoundaryDate =

                    MIN(CAST(prv.value AS datetime))