RIP OLE DB

I was very surprised when Microsoft announced deprecation of OLE DB provider for SQL Server data access last week on the Data Access Blog and MSDN Forums Announcement. The next release of SQL Server, code-named “Denali”, will be the last to ship a new SQL Server Native Client OLE DB provider. The SQL Server Native Client OLE DB driver will continue to be supported for 7 years after the Denali release so we have plenty of time to plan accordingly.

The other Microsoft-supplied OLE DB driver for SQL Server, SQLOLEDB, has been deprecated for many years now. The deprecated SQLOLEDB driver (and deprecated SQLSRV32.DLL ODBC driver) is part of the older MDAC package and is currently included in Windows operating systems as part of Windows Data Access Components for backwards compatibility. Windows 7 is the last Windows version that will include a SQL Server OLE DB and ODBC driver out of the box. Microsoft recommends that we use the SQL Server Native Client ODBC driver as the SQL Server data access technology of choice from native code going forward.

What This Means to You

Avoid using OLE DB for new SQL Server application development. Update your technology roadmap to move towards migrating existing SQL Server applications that use the SQLNCLI, SQLNCLI10, SQLNCLI11 or SQLOLEDB OLE DB providers to the SQL Server Native Client ODBC driver.

Note that much is still unknown since current versions of SQL Server rely heavily on OLE DB. Although this is purely speculation on my part, it stands to reason that we will see improved ODBC support across all Microsoft products and SQL Server features that currently rely on OLE DB for relational data access.

New SQL Server Development

Use one of the following SQL Server relational database access technologies for new development:

·         Managed code (e.g. C#, VB.NET, managed C++): Use Sysem.Data SqlClient. SqlClient is part of the .NET framework and is the preferred way to access SQL Server from managed code (C#, VB.NET, managed C++). The only reason I can think why not to use SqlClient from managed code is if an application needs to also support other DBMS products using the same interface without coding an additional abstraction layer. In that case accessing different database products Sysem.Data.Odbc is an alternative.

·         Native code (e.g. unmanaged C++): Use ODBC with the Server Native Client driver. The ODBC call-level interface can be used directly or via the higher-level ADO API. The SQL Server Native Client ODBC driver is included with SQL Server and also available as a separate download. 

Migrating Existing Applications

I sometimes see existing managed applications use ADO (e.g. ADODB.Connection) instead of SqlClient. ADO is a COM-based API primarily intended to be used from native code rather than managed code. Typically, these applications were either converted from VB 6 or the developer used ADO instead of ADO.NET due to unfamiliarity with the ADO.NET object model.  This is a good opportunity to convert such code to use System.Data.SqlClient, which will perform better than OLE DB or ODBC from managed code. 

If you have an ADO application where performance is not a concern or the conversion is not worth the effort, an alternative is to simply change the provider to MSDASQL (OLE DB Provider for ODBC Drivers) and add the SQL Server Native Client ODBC driver specification. This can be done with a simple connection string change and the MSDASQL provider will translate the ADO OLE DB calls to ODBC. For example, to use the SQL Server 2008 SNAC ODBC driver:

Old OLE DB connection string: "Provider=SQLNCLI10.1;Data Source=MyServer;Integrated Security=SSPI"

New ODBC connection string: "Provider=MSDASQL;Driver={SQL Server Native Client 10.0};Server=MyServer;Trusted_Connection=Yes"

 

The same connection string change can be used for any ADO application, including ASP classic, legacy VB 6 or unmanaged C++.

Perhaps the biggest challenge will be native code that uses the OLE DB COM interfaces directly instead of going through higher level APIs like ADO. I’ve seen this most commonly done for performance sensitive applications in C++. The best approach here will be to convert the application to use the ODBC call-level interface directly. This will provide the highest SQL Server data access performance from native code. The difficulty of such a change will depend much on the application object model and design. Ideally, data access libraries are shared and abstracted so that low-level data access code changes only need to be made in one place.

Why SQLOLEDB and SQLNCLI Was Deprecated

If you’ve used SQL Server for a long time like me, you’ve seen a number of APIs come and go (http://blogs.msdn.com/b/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx). APIs are largely driven by changes in development and platform technologies that change over time. It is possible for Microsoft to support legacy APIs indefinitely but doing so would waste precious development resources on maintenance instead of adding new features that are important to us. COM-based APIs like OLE DB are complex and it just doesn’t make sense to have many APIs that basically do the same thing. 

So we now have the short list of SQL Server relational data access APIs going forward:

·         SqlClient (managed code)

·         JDBC (Java)

·         ODBC (for native code)

Summary

I’m a big fan of open, cross-platform standards so I’m glad that Microsoft chose ODBC over OLE DB for relational database access. ODBC is an implementation of the SQL call-level interface standard (ISO/IEC 9075-3). In contrast, the COM-based OLE DB SQL Server provider relies on proprietary Microsoft Windows COM technology. The SNAC ODBC driver is a truly native driver and provides the fastest SQL Server database access from native code.

 

Denali CTP3: THROW Statement

Not to mince words, T-SQL error handling has historically sucked. I’m excited that SQL Server “Denali” CTP3 (a.k.a. SQL11) includes a long-awaited THROW statement that I hope to see in the final release. In this post, I’ll dive into how this seemingly minor T-SQL enhancement will make it much easier for T-SQL developers to write robust and bug-free error handling code.
T-SQL Error Handling Ugliness

Unlike compiled application code that halts code execution upon an unhandled exception, a T-SQL might continue code execution afterward. T-SQL developers must include error checking/handling is to ensure code doesn’t continue down the “happy” path oblivious to an error, report the error to the caller, perform any necessary cleanup operations (typically ROLLBACK) and continue/halt execution as desired. The script below shows how one might accomplish this without structured error handling:





--Unstructured error handling example

BEGIN TRAN

SELECT 1/0 AS CauseAnError --report error caller

IF @@ERROR<> 0 GOTO ErrorHandler --detect error

COMMIT

GOTO Done


ErrorHandler:

IF @@TRANCOUNT> 0 ROLLBACK--cleanup after error

RETURN --stop further code execution

Done:

PRINT 'Done'--not executed after error

GO


This script results in the error:





Msg 8134, Level 16, State 1, Line 3

Divide by zero error encountered.


Unstructured error handling like this is especially a pain for multi-statement scripts and stored procedures. One has to include repetitive “IF @@ERROR” check to detect errors after each statement and error-prone unstructured GOTO code. It’s easy to miss error checking/handling bugs in unit testing.

On a positive note, no T-SQL code is necessary to raise the error; SQL Server automatically reports errors to the calling application without any T-SQL code to do so (unless TRY/CATCH is used). This guarantees the calling application is notified of errors during execution.

Two Steps Forward, One Step Back

The introduction of structured error handling (TRY/CATCH) in SQL 2005 is a both a blessing and a curse. The good is that TRY/CATCH avoids the repetitive, error prone and ugly procedural code needed to check @@ERROR after each T-SQL statement and allows one to more easily centralize error handling. The structured error-handling paradigm in T-SQL is more aligned with most application languages.

Consider the equivalent script with TRY/CATCH:





--Structured error handling example

DECLARE

@ErrorNumber int

,@ErrorMessage nvarchar(2048)

,@ErrorSeverity int

,@ErrorState int

,@ErrorLine int;

BEGIN TRY--detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

SELECT

@ErrorNumber =ERROR_NUMBER()

,@ErrorMessage =ERROR_MESSAGE()

,@ErrorSeverity = ERROR_SEVERITY()

,@ErrorState =ERROR_STATE()

,@ErrorLine =ERROR_LINE();

IF @@TRANCOUNT> 0 ROLLBACK; --cleanup after error

RAISERROR('Error %d caught at line %d: %s'--report error to caller

,@ErrorSeverity

,@ErrorState

,@ErrorNumber

,@ErrorLine

,@ErrorMessage);

RETURN;--stop further code execution

END CATCH

PRINT 'Done'; --not executed after error

GO






Msg 50000, Level 16, State 1, Line 21

Error 8134 caught at line 10: Divide by zero error encountered


I really like the way structured error handling catches errors declaratively with centralized error handling. But TRY/CATCH introduces a couple of issues. Foremost is reporting of the error to the caller. A caught error prevents the error message from being returned to the client. When TRY/CATCH is employed, the developer assumes responsibility to notify the application that an error occurred. Failure to do so will result in a silent error undetectable by the calling application, which is seldom desirable. Using TRY/CATCH necessitates that you write a bit of code in the CATCH block to capture, report and/or log error details as well as control code flow after the error.

Another downside of TRY/CATCH before Denali is that you cannot raise the original error because RAISERROR does not allow a system error number to be specified (8134 in this example). Consequently, the divide by zero system error here cannot be raised in the CATCH block; a user-defined error in the 50000+ error number range must be raised instead, obfuscating the original error and line number. So instead of returning error information natively, you must write code to return original error details by some other means, such as in the error message text. This often leads to inconsistencies in the way errors are reported.

THROW to the Rescue

Denali introduces a simple THROW statement. THROW in a CATCH block with no parameters raises the caught error and stops further code execution unless an outer CATCH block exists. This greatly simplifies CATCH block error reporting and control flow code since this THROW behavior is exactly what one typically does after handling a T-SQL error. Furthermore, unlike RAISERROR, THROW retains the original error number, message text, state, severity and line number. This is the biggest T-SQL error handling enhancement since the introduction of TRY/CATCH in SQL Server 2005.

The THROW example below raises the original error and stops further code execution and is less verbose and error-prone than other methods:





--Structured error handling example in Denali CTP3

BEGIN TRY--detect errors

BEGIN TRAN;

SELECT 1/0 AS CauseAnError;

COMMIT;

END TRY

BEGIN CATCH

IF @@TRANCOUNT> 0 ROLLBACK; --cleanup after error

THROW; --report error to caller and stop further code execution

END CATCH

PRINT 'Done'; --not executed after error

GO






Msg 8134, Level 16, State 1, Line 4

Divide by zero error encountered.


There are only a couple of scenarios I can think of not to use THROW in a CATCH block. One is when you need to continue code execution in the same scope after an error. Another is in an outermost catch block when you want to prevent the error from being returned to the client. However, these cases are the exception (no pun intended) rather than the rule.

Summary

THROW is a simple, yet powerful extension to SQL Server error handling. I’ll discuss some other enhancements to the core database engine as outlined in the What’s New section of the SQL Server “Denali” Books Online in future posts as well.

Internal SQL Server Database Version Numbers

A database created by a more recent version of SQL Server cannot be attached or restored to an earlier version. This restriction is simply because an older version cannot know about file format changes that were introduced in the newer release. 
If you attempt to attach a database to an earlier version, you will get SQL Server error 948 with the internal version numbers listed in the error message text. For example, the following error occurs if you try to attach a SQL Server 2008 R2 database to a SQL Server 2008 server:

The database 'MyDatabase' cannot be opened because it is version 665. This server supports version 661 and earlier. A downgrade path is not supported.

Sample text from SQL Server error 948
The cryptic version numbers in the error message refer to the internal database version. These internal version numbers are undocumented but are (at least currently) the same value reported by the DATABASEPROPERTYEX function 'Version' property of the source database. If you are unsure of the source database version, the table below maps the internal version numbers to SQL Server versions so you can determine the minimum version you need for the attach to succeed:

SQL Server Version

Internal Database Version

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 SP2+ with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server versions and internal database versions
Below are the allowable SQL Server upgrade paths for a database attach or restore. The internal database version will be as above after a successful attach or restore.

Target SQL Server Version

Source SQL Server Version

Internal Database Version

SQL Server 2008 R2

SQL Server 2008 R2

665

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2008

SQL Server 2008

661

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 2005 SP2+

SQL Server 2005 with vardecimal enabled

612

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2005

SQL Server 2005

611

SQL Server 2000

539

SQL Server 7

515

SQL Server 2000

SQL Server 2000

539

SQL Server 7

515


SQL Server 7

SQL Server 7

515

Database File Versions and Upgrade Paths
As I mentioned earlier, downgrades are not supported. You’ll need to copy objects and data from the newer source database to the older target if you need to downgrade; attach or restore is not an option to copy a database to an earlier version.

SQL Server Connection Strings

This is the first of a series of posts on SQL Server connection strings. I don’t think connection strings are all that complicated but I often see developers have problems because they simply cloned an existing connection string (or found one on the internet) and tweaked it for the task at hand without really understanding what the keywords and values mean. This often results in run-time errors that can be tricky to diagnose. 
In this post, I’ll provide a connection string overview and discuss SqlClient connection strings and examples. I’ll discuss OLE DB and ODBC (used via ADO or ADO.NET) and JDBC in more detail the future articles.
Overview
SQL Server can be accessed using several technologies, each of which has different connection string particulars. Connection strings are provider/driver specific so one first needs to decide on a client API before formulating the proper string can be created. 
All connection strings share the same basic format, name/value pairs separated by semicolons, but the actual connection string keywords may vary by provider. Which keywords are required or optional also vary by provider and providers often share the same keywords (or provide synonyms) to minimize the connection string changes when switching between different providers. Most connection string keywords are optional and need to be specified only when the default is not appropriate. Connection string values should be enclosed in single or double quotes when the value may include a semicolon or equal sign (e.g. Password="a&==b=;1@23")
The purpose of a connection string is to supply a SQL Server provider/driver with the information needed to establish a connection to a SQL Server instance and may also be used to specify other configuration values, such as whether connection pooling is used. At the end of the day, the provider/driver needs to know at least:
·         SQL Server name (or address)
·         Authentication method (Windows or SQL Server)
·         Login credentials (login and password for SQL Server authentication)
SqlClient
One typically uses the .Net Framework Provider for SQL Server (abbreviated to SqlClient here) in managed code and a SQL Server OLE DB provider or ODBC driver from unmanaged code. It is possible to use OLE DB or ODBC for SQL Server data access in managed code but there is seldom a reason to do so since SqlClient offers high-performance access to SQL Server natively.
The authoritative reference for SqlClient connection strings is http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectionstring.aspx. My goal is not to rehash all of the keywords or illustrate the many combinations here but rather show the ones most commonly used along with best practices. I use the primary keywords rather than synonyms or equivalent keywords in the examples.
The SqlConnectionStringBuilder class provides a programmatic way to build connection strings needed by SqlConnection class. The nice thing about SqlConnectionStringBuilder is that it provides IntelliSense and avoids connection string typos. It should always be used when constructing connection strings based in user input (e.g. user id and password prompt). But you still need to know which connection string properties (keywords) you need to set along with the default values. The examples here apply regardless of whether or not you use yjr SqlConnectionStringBuilder class.
SqlClient Connection String Keyword Examples
Unlike other providers, there is no “Provider” or “Driver” connection string keyword in a SqlClient connection string.  The .Net Framework Provider for SQL Server is implicit with a SqlConnection class so it is redundant to also specify the provider.
I’ll start with the minimal keyword(s) needed. The minimal SqlClient connection string need only specify the authentication method.  The example below specifies Windows authentication using “Integrated Security=SSPI”. This connection string will connect the default instance on the local machine using Windows authentication under the current process Windows security credentials. 




Integrated Security=SSPI
Listing 1: Connect to local default instance using Windows authentication
To connect to the local default instance using SQL authentication, just specify the credentials using the “User ID” and “Password” keywords instead of “Integrated Security=SSPI” keyword. SQL authentication is the default when “Integrated Security” or “Trused_Connection” keyword is not specified. Although I commonly see "Persist Security Info=False" also specified (a best practice from a security perspective), that is the default setting and may be omitted. Be aware that you should encrypt connection strings (or passwords in general) stored in configuration files when using SQL authentication.




User ID=MyLogin;Password=MiP@ssw0rd
Listing 2: Connect to local default instance using SQL authentication
One often connects to a remote SQL Server. Along with the authentication method, add the Data Source keyword to specify the desired SQL Server name or network address.




Data Source=SQLSERVERNAME;Integrated Security=SSPI
Listing 3: Connect to default instance on host SQLSERVERNAME using Windows authentication




Data Source=SQLSERVERNAME;User ID=MyLogin;Password=MiP@ssw0rd
Listing 4: Connect to instance on host SQLSERVERNAME using SQL authentication
Note that these same connection strings may be used to connect locally or remotely. Personally, I recommend always specifying the Data Source even when connecting locally. This makes it easy to move the application to another machine using with the same configuration and helps avoid oversights.
It is usually best to let SqlClient determine the appropriate network library to use rather than an explicit specification. SqlClient will figure out the appropriate network library based on the specified Data Source value. When you connect to a local instance using an unqualified name (or the value “(local)”), Shared Memory is used by default. SqlClient will use TCP/IP if a FQDN (e.g. SQLSERVERNAME.MyDOMAIN.COM) or IP address is specified regardless of whether the instance is local or remote. Since TCP/IP is most commonly used nowadays, I’ll focus on TCP/IP in this article and use a FQDN in the subsequent examples to avoid ambiguity.
It is often desirable to specify the initial database context in the connection sting. If omitted, the default database of the authenticated account is used. This is accomplished using either the “Initial Catalog” or “Database” keyword. I suggest always including the “Initial Catalog” keyword.




Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 4: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
Named Instances
The connection strings I’ve shown so far assume the target is a default SQL Server instance listening on port 1433. One can run multiple instances of SQL Server on the same host using the named instance feature. If your target database instance is a named instance, SqlClient will also need to know the instance name or instance port number. The instance name can be specified by appending a backslash and instance name to the Data Source value:




Data Source=SQLSERVERNAME.MYDOMAIN.COM\MYINSTANCE;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 5: Connect to named instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase
As an aside, I often see connectivity problems with named instances due to oversights in the SQL Server configuration. When an instance name is specified, SqlClient interrogates the SQL Server Brower service on the SQL Server host to determine the instance port (or named pipe name). The SQL Server Brower service is disabled by default so you need to enable and start it in order to connect by the instance name. This can be done using the SQL Server Configuration Manager tool. Also, since the SQL Server Brower service communicates over UDP port 1434, that port must be allowed through firewalls.
You can specify a port number instead of instance name to directly to a named instance (or to a default instance listing on a non-standard port). The port may be specified by appending a comma and port number to the data source value. The needed port number can be ascertained from the SQL Server Configuration Manager tool.




Data Source=SQLSERVERNAME.MYDOMAIN.COM,60086;Integrated Security=SSPI;Initial Catalog=MyDatabase
Listing 6: Connect to instance on host SQLSERVERNAME listening on port 60086 using Windows authentication with initial database context of MyDatabase
Additional Keywords

In addition to the “Data Source”, “Initial Catalog” and “Integrated Security” (or “User Id” and “Password”) keywords I’ve discussed so far, I recommend that “Application Name” also be specified. The specified string is helps identify the application when monitoring activity on the database server. This is especially useful when an application server or client hosts multiple applications.




Data Source=SQLSERVERNAME.MYDOMAIN.COM;Integrated Security=SSPI;Initial Catalog=MyDatabase;Application Name=Connection String Example
Listing 7: Connect to default instance on host SQLSERVERNAME using Windows authentication with initial database context of MyDatabase with application name specification
In my opinion, the many other keywords are noise unless the default values are inappropriate for your environment. 
Summary
You can get by nicely in most cases with only the 4 or 5 SqlClient connection string keywords I’ve discussed here. I suggest you establish a connection string standard that includes the “Data Source”, “Initial Catalog”, “Application Name” keywords plus the authentication method, “Integrated Security=SSPI” or “User Id” and “Password”.

Move a Partition to a Different File Group Efficiently

SQL Server table partitioning can reduce storage costs associated with large tables while maintaining performance SLAs.  Table partitioning, available in Enterprise and above SKUs, allows you to keep frequently used current data on fast storage while storing infrequently accessed older data on slower, less expensive storage.  But moving vast amounts of data efficiently as data ages can be a challenge.  This post will discuss alternate techniques to accomplish this task.

Consider the scenario of a table partitioned on a datetime column by month.  Your objective is to keep recent (current and prior month) data on a solid state disk and older data on traditional spinning media.  2 filegroups are used for this table, one with files on a solid state device and the other with files on spinning disks.  The table is partitioned with a RANGE RIGHT partition function (inclusive date boundary) and monthly sliding window maintenance is scheduled to create a partition for the new month and perhaps remove the oldest month.  Every month after the slide, you want to move an older partition (prior month minus 1) from fast to slow storage to make room for new data on the fast file group.

The Simple Method

The easiest way to move a partition from the NewerData file group to the OlderData filegroup is with MERGE and SPLIT.  The example below will move the February partition from the NewerData to the OlderData filegroup:  

Simple maintenance script example:

-- Monthly Partition Move Scipt

-- merge month to be moved into prior month partition

ALTER PARTITION FUNCTION PF_Last12Months()

MERGE RANGE ('20110201');

 

-- set partition scheme next used to the OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

NEXT USED OlderData;

 

-- move data from NewData to OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

SPLIT RANGE ('20110201');

 

The figures below show the partitions before and after this script was run against a 10M row test table (setup script with complete DDL and sample data at the end of this post).  Although this method is quite easy, it can take quite a bit of time with large partitions.  This MERGE command will merge February data into the January partition on the OlderData filegroup, requiring all of February’s data to be moved in the process, and then remove the February partition.  The SPLIT will then create a new February partition on the OlderData filegroup, move February data to the new partition and finally remove the February data from the source partition.  So February data is actually moved twice, once by the MERGE and again by the SPLIT. 

This MERGE/SPLIT process took 52 seconds on my test system with a cold buffer cache but I was only moving 738,780 rows.  Think about the performance impact of this method against a much larger production table partition.  The atomic MERGE and SPLIT are offline operations so the entire table is unavailable while those statements are running.  Also, these operations are resource intensive when a lot of data needs to be moved and/or you have many indexes.

Before maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_NewerData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

After maintenance:

Rows

Partition Number

Filegroup

Lower Boundary

Upper Boundary

0

1

PartitioningDemo_OlderData

 

4/1/2010 12:00:00 AM

791,549

2

PartitioningDemo_OlderData

4/1/2010 12:00:00 AM

5/1/2010 12:00:00 AM

817,935

3

PartitioningDemo_OlderData

5/1/2010 12:00:00 AM

6/1/2010 12:00:00 AM

791,550

4

PartitioningDemo_OlderData

6/1/2010 12:00:00 AM

7/1/2010 12:00:00 AM

817,935

5

PartitioningDemo_OlderData

7/1/2010 12:00:00 AM

8/1/2010 12:00:00 AM

817,935

6

PartitioningDemo_OlderData

8/1/2010 12:00:00 AM

9/1/2010 12:00:00 AM

791,550

7

PartitioningDemo_OlderData

9/1/2010 12:00:00 AM

10/1/2010 12:00:00 AM

817,935

8

PartitioningDemo_OlderData

10/1/2010 12:00:00 AM

11/1/2010 12:00:00 AM

791,550

9

PartitioningDemo_OlderData

11/1/2010 12:00:00 AM

12/1/2010 12:00:00 AM

817,935

10

PartitioningDemo_OlderData

12/1/2010 12:00:00 AM

1/1/2011 12:00:00 AM

817,935

11

PartitioningDemo_OlderData

1/1/2011 12:00:00 AM

2/1/2011 12:00:00 AM

738,780

12

PartitioningDemo_OlderData

2/1/2011 12:00:00 AM

3/1/2011 12:00:00 AM

817,935

13

PartitioningDemo_NewerData

3/1/2011 12:00:00 AM

4/1/2011 12:00:00 AM

369,476

14

PartitioningDemo_NewerData

4/1/2011 12:00:00 AM

5/1/2011 12:00:00 AM

0

15

PartitioningDemo_NewerData

5/1/2011 12:00:00 AM

 

 

SWITCH and DROP_EXISTING Method

An alternative to the method above is to employ SWITCH along with the DROP EXISTING option of CREATE INDEX.  As you may know, SWITCH of an aligned partition is a metadata-only operation and is very fast because no physical data movement is required.  Furthermore, CREATE INDEX…WITH DROP_EXISTING = ON avoids sorting when the existing table index is already suitably sorted and is especially appropriate for improving performance of large index rebuilds.  Using these commands, instead of relying on SPLIT and MERGE to move data, will greatly reduce the time needed to move a partition from one filegroup to another.  The maintenance script below reduced the time of the partition move from 52 seconds down to 7 seconds, reducing maintenance time by over 85% compared to the MERGE/SPLIT script above.  

Demo Maintenance Script

-- Monthly Partition Move Scipt

DECLARE @MonthToMove datetime = '20110201';

 

-- create staging table on NewerData filegroup with aligned indexes

IF OBJECT_ID(N'dbo.PartitionMoveDemoStaging') IS NOT NULL

      DROP TABLE dbo.PartitionMoveDemoStaging;

CREATE TABLE dbo.PartitionMoveDemoStaging(

      PartitioningDateTimeColumn datetime NOT NULL

      ,Column1 bigint NOT NULL

) ON PartitioningDemo_NewerData;

 

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      ON PartitioningDemo_NewerData;     

 

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      ON PartitioningDemo_NewerData;     

 

-- switch partition into staging table

ALTER TABLE dbo.PartitionMoveDemo

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemoStaging;

 

-- remove partition

ALTER PARTITION FUNCTION PF_Last12Months()

      MERGE RANGE (@MonthToMove);

     

-- set next used to OlderData filegroup

ALTER PARTITION SCHEME PS_Last12Months

      NEXT USED PartitioningDemo_OlderData;

 

-- recreate partition on OlderData filegroup

ALTER PARTITION FUNCTION PF_Last12Months()

      SPLIT RANGE (@MonthToMove);

     

-- recreate staging table indexes using the partition scheme

-- this will move the staging table to OlderData filegroup with aligned indexes

CREATE CLUSTERED INDEX cdx_PartitionMoveDemoStaging_PartitioningColumn

      ON dbo.PartitionMoveDemoStaging(PartitioningDateTimeColumn)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemoStaging_Column1

      ON dbo.PartitionMoveDemoStaging(Column1)

      WITH (DROP_EXISTING = ON)

      ON PS_Last12Months(PartitioningDateTimeColumn);

 

-- switch staging table back into primary table partition

ALTER TABLE dbo.PartitionMoveDemoStaging

      SWITCH PARTITION $PARTITION.PF_Last12Months(@MonthToMove)

      TO dbo.PartitionMoveDemo PARTITION $PARTITION.PF_Last12Months(@MonthToMove);

 

The maintenance steps here are similar to the first method except that the partition is SWITCHed into a staging table before the MERGE and SPLIT.  This way, no data movement is needed during the MERGE or SPLIT.  After the MERGE and SPLIT, staging table indexes are recreated using the same partition scheme as the primary table.  This will move the staging table from the NewerData to the OlderData filegroup and ensure staging table indexes are aligned for the SWITCH.  The DROP_EXISTING = ON option allows the CREATE INDEX to leverage the existing staging table index sequence, thus eliminating the need to sort the index keys.  Finally, the staging table is SWITCHed back into the moved partition.

I hope you find this method useful.  Below is the script I used to create the demo database and objects. 

Demo Setup Script

--create database with monthly filegroups

CREATE DATABASE PartitioningDemo

ON(

      NAME='Primary',

      FILENAME='S:\SolidState\PartitioningDemo.mdf',

      SIZE=10MB),

FILEGROUP NewerData (

      NAME='PartitioningDemo_NewerData',

      FILENAME='S:\SolidState\PartitioningDemo_NewerData.ndf',

      SIZE=400MB,

      FILEGROWTH=10MB),

FILEGROUP OlderData (

      NAME='PartitioningDemo_OlderData',

      FILENAME='D:\SpinningDisks\PartitioningDemo_OlderData.ndf',

      SIZE=600MB,

      FILEGROWTH=10MB)

LOG ON(

      NAME='PartitioningDemo_Log',

      FILENAME='L:\LogFiles\PartitioningDemo_Log.ldf',

      SIZE=10MB,

      FILEGROWTH=10MB);

     

ALTER DATABASE PartitioningDemo

      SET RECOVERY SIMPLE;

GO

 

USE PartitioningDemo;

 

CREATE PARTITION FUNCTION PF_Last12Months( datetime )

AS RANGE RIGHT

FOR VALUES

(               -- older_than_current_minus_12

      '20100401'  -- current_minus_12

      ,'20100501' -- current_minus_11

      ,'20100601' -- current_minus_10

      ,'20100701' -- current_minus_9

      ,'20100801' -- current_minus_8

      ,'20100901' -- current_minus_7

      ,'20101001' -- current_minus_6

      ,'20101101' -- current_minus_5

      ,'20101201' -- current_minus_4

      ,'20110101' -- current_minus_3

      ,'20110201' -- current_minus_2

      ,'20110301' -- current_minus_1

      ,'20110401' -- current

      ,'20110501' -- future

);

 

CREATE PARTITION SCHEME PS_Last12Months

AS PARTITION PF_Last12Months

TO

      (

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      OlderData,

      NewerData, -- minus 2 month (to be moved to OlderData)

      NewerData, -- minus 1 month

      NewerData, -- current month

      NewerData  -- future month+

      );

 

-- create table with 10,000,000 rows

ALTER DATABASE PartitioningDemo

      MODIFY FILEGROUP NewerData DEFAULT;

 

WITH

      t1 AS (SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2

                            UNION ALL SELECT 3 UNION ALL SELECT 4

                            UNION ALL SELECT 5 UNION ALL SELECT 6

                              UNION ALL SELECT 7 UNION ALL SELECT 8

                              UNION ALL SELECT 9),

      t2 AS (SELECT a.n

                    FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g)

SELECT

      ISNULL(

            DATEADD(

                  day

                  , (ROW_NUMBER() OVER(ORDER BY t2.n))/26385, '20100401')

                  , '20100401') AS PartitioningDateTimeColumn

      ,ISNULL((ROW_NUMBER() OVER(ORDER BY t2.n)), 0) AS Column1

INTO dbo.PartitionMoveDemo

FROM t2;

 

-- create indexes partitioned indexes on table

CREATE CLUSTERED INDEX cdx_PartitionMoveDemo_PartitioningColumn

      ON dbo.PartitionMoveDemo(PartitioningDateTimeColumn)

      ON PS_Last12Months(PartitioningDateTimeColumn);

     

CREATE NONCLUSTERED INDEX idx_PartitionMoveDemo_Column1

      ON dbo.PartitionMoveDemo(Column1)

      ON PS_Last12Months(PartitioningDateTimeColumn);

GO

Stairway Series on SQLServerCentral.com

SQLServerCentral.com launched a new Stairway content series today, targeting specific areas of SQL Server.  Each Stairway includes a series of up to 12 levels focused on a specific SQL Server topic.  The goal is to guide DBAs and developers with little or no understanding of a subject through a sequence of tutorials in order to quickly gain the knowledge one needs to use a SQL Server feature confidently in a production environment.  Kalen Delaney, editor of the Stairway series, is one of the most respected experts in the world-wide SQL Server community.

I was flattered when Kalen gave me the opportunity to contribute to the series with a Stairway on Server-side Tracing.  For years I’ve cautioned against using Profiler indiscriminately both here as well as in the MSDN forums and newsgroups.  But it seems many DBAs still don’t differentiate between Profiler and server-side tracing.  I’m hoping this Server-side Tracing Stairway will empower DBAs with the knowledge to choose the right tool for the job.

My apologies for having gone dark for the last several months.   The subject of this post is the primary reason; there are only so many hours in the day L

Calendar Table and Date/Time Functions

I frequently see questions in the forums and newsgroups about how to best query date/time data and perform date manipulation.  Let me first say that a permanent calendar table that materializes commonly used DATEPART values along with time periods you frequently use is invaluable.  I’ve used such a table for over a decade with great success and strongly recommend you implement one on all of your database servers.  I’ve included a sample calendar table (and numbers table) later in this post and you can find other variations of such a table via an internet search.

Removing the Time Portion

A common requirement I have is to remove the time portion from a date/time value.  This is easy in SQL 2008 since you can simply “CAST(SomeDateTimeValue AS date)”.  But the date data type is not available in older SQL Server versions so you need an alternate method.  In SQL 2005 and earlier versions, I recommend the DATEADD…DATEDIFF method below with an arbitrary base date value specified in a format that is independent of the session DATAFORMAT setting:

SELECT CAST(GETDATE() AS date); --SQL 2008 and later

SELECT DATEADD(day, DATEDIFF(day, '19000101', GETDATE()), '19000101'); --SQL 2005 and earlier

 

I often see a variation of the DATEADD…DATEDIFF technique with the integer zero (no quotes) specified as the base date.  Although this may provide the expected results (I’ve done it myself), I caution against it because it relies on implicit conversion from the internal SQL Server integer date/time storage format.  If you want to be concise, a better approach is to specify an empty string for the base date value since the default value is ‘1900-01-01 00:00:00’.  In my opinion, an explicit data value is more intuitive, though.

SELECT DATEADD(day, DATEDIFF(day, '', GETDATE()), '');

 

I also sometimes see code that extracts the year, month and day date parts and concatenates with separators.  However, that method is dependent on session DATEFORMAT settings and slower than other methods.  See Tibor Karaszi’s The ultimate guide to the datetime datatypes article for details.

First and Last Day of Period

Another common task is to determine the first or last day of a given period.  The script below shows how to accomplish this of you don’t have a calendar table with the calculated values available.

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], DATEADD(year, DATEDIFF(year,'19000101',@Date), '19000101') AS [CalendarDate]

UNION ALL

SELECT 'Last day of year', DATEADD(day,-1,DATEADD(year,0,DATEADD(year,DATEDIFF(year,'19000101',@Date)+1,'19000101')))

UNION ALL

SELECT 'First day of month', DATEADD(month, DATEDIFF(month,'19000101',@Date), '19000101')

UNION ALL

SELECT 'Last day of month', DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'19000101',@Date)+1,'19000101')))

UNION ALL

SELECT 'First day week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1),DATEDIFF(day,'19000101', @Date))

UNION ALL

SELECT 'Last day of week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1)+6,DATEDIFF(day,'19000101', @Date));

 

With a calendar table like the one later in this post:

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], (SELECT FirstDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of year', (SELECT LastDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'First day of month', (SELECT FirstDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of month', (SELECT LastDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'First day week (based on DATEFIRST setting)', (SELECT FirstDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of week (based on DATEFIRST setting)', (SELECT LastDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date);

 

Calendar and Numbers Table

I think auxiliary calendar and number tables are a must-have on every database server.  These objects allow you to easily perform set-based processing in a number of scenarios.  In fact, the calendar table population script below uses a numbers table to populate the calendar table with several thousand rows in under a second.  This is much more efficient that a WHILE loop.

This calendar table population script also updates the table with most US holidays and adjusts business/non-business days accordingly.  In addition to customizing the script for holidays as observed by your organization, you might add fiscal period start/end dates to facilitate querying based on those cycles.  Also consider creating user-defined functions or stored procedures to encapsulate frequently used code that uses the calendar table.  For example, here is a function that returns the date that is a specified number of business days from the date provided:

CREATE FUNCTION dbo.udf_AddBusinessDays

(@Date date, @BusinessDays int)

RETURNS date

AS

BEGIN

      RETURN (

            SELECT TOP (1) CalendarDate AS BusinessDate

            FROM (SELECT TOP (@BusinessDays) CalendarDate

                  FROM dbo.Calendar

                  WHERE

                        CalendarDate > @Date

                        AND BusinessDay = 1

                  ORDER BY CalendarDate) AS BusinessDays

            ORDER BY CalendarDate DESC

      )

END

GO

Script 1: Example calendar table utility function

--auxiliary number table

CREATE TABLE dbo.Numbers(

      Number int NOT NULL

            CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED

      );

     

--load Numbers table with 1,000,000 numbers

WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)

      ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)

      ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)

      ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)

      INSERT INTO dbo.Numbers WITH (TABLOCKX) (

            Number

            )

            SELECT number

            FROM numbers

                                                                                                                                                                                                                        WHERE number < 1000000;                                                                                    

Script 2: Create and populate numbers table

CREATE TABLE dbo.Calendar(

      CalendarDate date NOT NULL

            CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED

      ,CalendarYear int NOT NULL

      ,CalendarMonth int NOT NULL

      ,CalendarDay int NOT NULL

      ,DayOfWeekName varchar(10) NOT NULL

      ,FirstDateOfWeek date NOT NULL

      ,LastDateOfWeek date NOT NULL

      ,FirstDateOfMonth date NOT NULL

      ,LastDateOfMonth date NOT NULL

      ,FirstDateOfQuarter date NOT NULL

      ,LastDateOfQuarter date NOT NULL

      ,FirstDateOfYear date NOT NULL

      ,LastDateOfYear date NOT NULL

      ,BusinessDay bit NOT NULL

      ,NonBusinessDay bit NOT NULL

      ,Weekend bit NOT NULL

      ,Holiday bit NOT NULL

      ,Weekday bit NOT NULL

      ,CalendarDateDescription varchar(50) NULL

);

GO

 

--load dates 2000-01-01 through 2025-12-31

WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)

      ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)

      ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b)

      ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)

INSERT INTO dbo.Calendar WITH (TABLOCKX) (

      CalendarDate

      ,CalendarYear

      ,CalendarMonth

      ,CalendarDay

      ,DayOfWeekName

      ,FirstDateOfWeek

      ,LastDateOfWeek

      ,FirstDateOfMonth

      ,LastDateOfMonth

      ,FirstDateOfQuarter

      ,LastDateOfQuarter

      ,FirstDateOfYear

      ,LastDateOfYear

      ,BusinessDay

      ,NonBusinessDay

      ,Weekend

      ,Holiday

      ,Weekday

      ,CalendarDateDescription

      )

SELECT

      CalendarDate = DATEADD(day, number, '20000101')

      ,CalendarYear = DATEPART(year, DATEADD(day, number, '20000101'))

      ,CalendarMonth = DATEPART(month, DATEADD(day, number, '20000101'))

      ,CalendarDay = DATEPART(day, DATEADD(day, number, '20000101'))

      ,DayOfWeekName = DATENAME(weekday, DATEADD(day, number, '20000101'))

      ,FirstDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1),DATEADD(day, number, '20000101'))

      ,LastDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1)+6,DATEADD(day, number, '20000101'))

      ,FirstDateOfMonth = DATEADD(month, DATEDIFF(month,'20000101',DATEADD(day, number, '20000101')), '20000101')

      ,LastDateOfMonth = DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'20000101',DATEADD(day, number, '20000101'))+1,'20000101')))

      ,FirstDateOfQuarter = DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101')), '20000101')

      ,LastDateOfQuarter = DATEADD(day, -1, DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))

      ,FirstDateOfYear = DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101')), '20000101')

      ,LastDateOfYear = DATEADD(day,-1,DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))

      --initially set all weekdays as business days

      ,BusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END

      --initially set only weekends as non-business days

      ,NonBusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END

      ,Weekend = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END

      ,Holiday = 0 --initially no holidays

      ,Weekday = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END

      ,CalendarDateDescription = NULL

FROM numbers

WHERE number < 9497;

 

--New Year's Day

UPDATE dbo.calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'New Year''s Day'

WHERE

    CalendarMonth = 1

    AND CalendarDay = 1;

 

--New Year's Day celebrated on Friday, December 31 when January 1 falls on Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'New Year''s Day Celebrated'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 31

    AND DayOfWeekName = 'Friday';

   

--New Year's Day celebrated on Monday, January 2 when January 1 falls on Sunday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'New Year''s Day Celebrated'

WHERE

    CalendarMonth = 1

    AND CalendarDay = 2

    AND DayOfWeekName = 'Monday';   

 

--Martin Luther King Day - 3rd Monday in January

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Martin Luther King Day'

WHERE

    CalendarMonth = 1

    AND DayOfWeekName = 'Monday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Monday'

        ) = 3;

 

--President's Day - 3rd Monday in February

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'President''s Day'

WHERE

    CalendarMonth = 2

    AND DayOfWeekName = 'Monday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Monday'

        ) = 3;

       

--Easter - first Sunday after the full moon following the vernal (March 21) equinox

UPDATE dbo.Calendar

SET

      Holiday = 1

    ,CalendarDateDescription = 'Easter'

WHERE

    CalendarDate IN(

        '20000423'

        ,'20010415'

        ,'20020331'

        ,'20030420'

        ,'20040411'

        ,'20050327'

        ,'20060416'

        ,'20070408'

        ,'20080323'

        ,'20090412'

        ,'20100404'

        ,'20110424'

        ,'20120408'

        ,'20130331'

        ,'20140420'

        ,'20150405'

        ,'20160427'

        ,'20170416'

        ,'20180401'

        ,'20190421'

        ,'20200412'

        ,'20210404'

        ,'20220417'

        ,'20230409'

        ,'20240331'

        ,'20250420'

    );

 

--Good Friday - 2 days before Easter Sunday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Good Friday'

WHERE

    CalendarDate IN(

        SELECT DATEADD(day, -2, c2.CalendarDate)

        FROM dbo.Calendar c2

        WHERE c2.CalendarDateDescription = 'Easter'

        );

 

--Memorial Day - last Monday in May

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Memorial Day'

WHERE

    CalendarMonth = 5

    AND DayOfWeekName = 'Monday'

    AND CalendarDate IN(

        SELECT MAX(c2.CalendarDate)

        FROM dbo.Calendar c2

        WHERE

            c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = 5

            AND c2.DayOfWeekName = 'Monday'

        );

 

--Independence Day - July 4th

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Independence Day'

WHERE

    CalendarMonth = 7

    AND CalendarDay = 4;

 

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Independence Day Celebrated'

WHERE

    CalendarMonth = 7

    AND CalendarDay = 3

    AND DayOfWeekName = 'Friday';

 

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Independence Day Celebrated'

WHERE

    CalendarMonth = 7

    AND CalendarDay = 5

    AND DayOfWeekName = 'Monday';

       

--Labor Day - first Monday in September

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Labor Day'

WHERE

    CalendarMonth = 9

    AND DayOfWeekName = 'Monday'

    AND CalendarDate IN(

        SELECT MIN(c2.CalendarDate)

        FROM dbo.Calendar c2

        WHERE

            c2.CalendarYear = calendar.CalendarYear

            AND c2.CalendarMonth = 9

            AND c2.DayOfWeekName = 'Monday'

        );

 

--Columbus Day - second Monday in October

UPDATE dbo.Calendar

SET

      Holiday = 1

    ,CalendarDateDescription = 'Columbus Day'

WHERE

    CalendarMonth = 10

    AND DayOfWeekName = 'Monday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Monday'

        ) = 2;

 

--Veteran's Day - November 11

UPDATE dbo.Calendar

SET

      Holiday = 1

    ,CalendarDateDescription = 'Veteran''s Day'

WHERE

    CalendarMonth = 11

    AND CalendarDay = 11;

 

--Thanksgiving - fourth Thursday in November

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Thanksgiving'

WHERE

    CalendarMonth = 11

    AND DayOfWeekName = 'Thursday'

 

    AND (SELECT COUNT(*) FROM

            dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Thursday'

        ) = 4;

       

--Day after Thanksgiving - fourth Friday in November

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Day after Thanksgiving'

WHERE

    CalendarMonth = 11

    AND DayOfWeekName = 'Friday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Friday'

        ) = 4;       

       

--Christmas Day - December 25th

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Christmas Day'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 25;

 

--Christmas day celebrated on Friday, December 24 when December 25 falls on a Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Christmas Day Celebrated'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 24

    AND DayOfWeekName = 'Friday';

 

--Christmas day celebrated on Monday, December 24 when December 25 falls on a Sunday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Christmas Day Celebrated'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 26

    AND DayOfWeekName = 'Monday';

Script 3: Create and populate calendar table and update with holidays

 

Secret of SQL Trace Duration Column

Why would a trace of long-running queries not show all queries that exceeded the specified duration filter?  We have a server-side SQL Trace that includes RPC:Completed and SQL:BatchCompleted events with a filter on Duration >= 100000.  Nearly all of the queries on this busy OLTP server run in under this 100 millisecond threshold so any that appear in the trace are candidates for root cause analysis and/or performance tuning opportunities.

After an application experienced query timeouts, the DBA looked at the trace data to corroborate the problem.  Surprisingly, he found no long-running queries in the trace from the application that experienced the timeouts even though the application’s error log clearly showed detail of the problem (query text, duration, start time, etc.).  The trace did show, however, that there were hundreds of other long-running queries from different applications during the problem timeframe.  We later determined those queries were blocked by a large UPDATE query against a critical table that was inadvertently run during this busy period.

So why didn’t the trace include all of the long-running queries?  The reason is because the SQL Trace event duration doesn’t include the time a request was queued while awaiting a worker thread.  Remember that the server was under considerable stress at the time due to the severe blocking episode.  Most of the worker threads were in use by blocked queries and new requests were queued awaiting a worker to free up (a DMV query on the DAC connection will show this queuing: “SELECT scheduler_id, work_queue_count FROM sys.dm_os_schedulers;”).  Technically, those queued requests had not started.  As worker threads became available, queries were dequeued and completed quickly.  These weren’t included in the trace because the duration was under the 100ms duration filter.  The duration reflected the time it took to actually run the query but didn’t include the time queued waiting for a worker thread.

The important point here is that duration is not end-to-end response time.  Duration of RPC:Completed and SQL:BatchCompleted events doesn’t include time before a worker thread is assigned nor does it include the time required to return the last result buffer to the client.  In other words, duration only includes time after the worker thread is assigned until the last buffer is filled.  But be aware that duration does include the time need to return intermediate result set buffers back to the client, which is a factor when large query results are returned.  Clients that are slow in consuming results sets can increase the duration value reported by the trace “completed” events.

Ad-Hoc Rollup by date/time Interval

I often use aggregate queries to rollup data by an arbitrary date/time interval.  I'll share some techniques that I use to accomplish the task in case you find these useful, using the same table below:

CREATE TABLE dbo.WebStats

(

      RequestTimestamp datetime NOT NULL,

      Page varchar(255) NOT NULL

);

CREATE CLUSTERED INDEX WebStats_cdx ON dbo.WebStats(RequestTimestamp, Page);

 

INSERT INTO dbo.WebStats (RequestTimestamp, Page)

VALUES

      ('2010-01-01T00:00:00', 'Default.aspx')

      ,('2010-01-01T00:00:15', 'Default.aspx')

      ,('2010-01-01T00:01:05', 'Order.aspx')

      ,('2010-01-01T00:01:30', 'Default.aspx')

      ,('2010-01-01T00:01:40', 'OrderStatus.aspx')

      ,('2010-01-01T00:02:05', 'Default.aspx')

      ,('2010-01-01T00:03:05', 'ProductInfo.aspx')

      ,('2010-01-01T00:03:30', 'Default.aspx');

GO

 

Simple Rollup

Without an auxiliary table, a little DATEADD magic can do the trick.  Here's an example that summarizes web page requests by minute for the specified date/time range:

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-02T00:00:00';

 

SELECT

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp) AS Interval,

      COUNT(*) AS PageRequests

FROM dbo.WebStats

GROUP BY

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp)

ORDER BY

      Interval; 

 

Results:

Interval

PageRequests

2010-01-01 00:00:00.000

2

2010-01-01 00:01:00.000

3

2010-01-01 00:02:00.000

1

2010-01-01 00:03:00.000

2

2010-01-01 00:29:00.000

1

2010-01-01 00:31:00.000

1

2010-01-01 00:42:00.000

1

2010-01-01 02:01:00.000

2

2010-01-01 02:03:00.000

2

2010-01-01 02:31:00.000

1

2010-01-01 02:44:00.000

1

2010-01-01 02:49:00.000

1

 

Arbitrary Intervals

The simple rollup method works well for any of the pre-defined units provided by the DATEADD function (year, quarter, month, day, hour, minute, second or week).  However, it lacks the flexibility to roll up to an arbitrary interval like 15 minutes or 30 seconds.  A little DATEADD/DATEDIFF math addresses this gap.  Below is an example of a 30-minute interval rollup using this technique:

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-01T04:00:00'

      ,@IntervalSeconds int = 1800; --30 minutes

SELECT

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp)

            / @IntervalSeconds * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(*) AS PageRequests

FROM dbo.WebStats

WHERE

      RequestTimestamp >= @StartTimestamp

      AND RequestTimestamp < @EndTimestamp

GROUP BY

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp) / @IntervalSeconds * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

 

Missing Intervals

You probably noticed that periods with no activity at all are omitted rather than reporting a zero value.  One method to include the missing intervals is with an outer join to a temporal table containing all the desired intervals.  Ideally, the temporal table would be a permanent one but I've found it impractical to maintain such a table for ad-hoc needs.  Fortunately, a utility numbers CTE is a handy way to generate the needed intervals dynamically.  The example below provides up to 65,536 interval values and can be easily extended as needed.

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-01T04:00:00'

      ,@IntervalSeconds int = 1800; --30 minutes

 

WITH

      T2 AS (SELECT 0 AS Num UNION ALL SELECT 0),

      T4 AS (SELECT 0 AS Num FROM T2 AS A CROSS JOIN T2 AS B),

      T256 AS (SELECT 0 AS Num FROM T4 AS A CROSS JOIN T4 AS B CROSS JOIN T4 AS C CROSS JOIN T4 AS D),

      T65536 AS (SELECT ROW_NUMBER() OVER(ORDER BY A.Num) AS Num FROM T256 AS A CROSS JOIN T256 AS B)

SELECT

      DATEADD(second

            ,(Num-1) * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(WebStats.RequestTimestamp) AS PageRequests

FROM T65536

LEFT JOIN dbo.WebStats ON

      WebStats.RequestTimestamp >= DATEADD(second, (Num-1) * @IntervalSeconds, @StartTimestamp)

      AND WebStats.RequestTimestamp < DATEADD(second, Num * @IntervalSeconds, @StartTimestamp)

WHERE

      Num <= DATEDIFF(second, @StartTimeStamp, @EndTimestamp) / @IntervalSeconds

GROUP BY

      DATEADD(second

            ,(Num-1) * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;  

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 01:00:00.000

0

2010-01-01 01:30:00.000

0

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

2010-01-01 03:00:00.000

0

2010-01-01 03:30:00.000

0

 

Collation Hell (Part 3)

In this final post of my Collation Hell series, I'll discuss techniques to change a SQL Server instance collation along with the collation of all databases and columns.  The objective is to ensure the standard collation is used throughout the entire SQL Server instance.  See part 1 and part 2 of this series for more information on selecting a standard collation and planning such a collation change.

Be aware that a complete collation change is not unlike that of a major version upgrade, except tools to facilitate the change are limitted.  You'll need to build new system databases, change user databases and change every character column to conform to the new collation.  These collation changes can be done using either a side-by-side migration technique or performed in-place.

Changing the Instance Collation

The SQL Server setup REBUILDDATABASE option (see Books Online) is used to create new system databases for an existing instance with the desired collation.  One advantage of using REBUILDDATABASE over a complete reinstall is that post-RTM service packs and patches don't need to be reapplied afterward.  However, all server level objects like logins, linked servers, jobs, etc. need to be recreated after the rebuild so you'll need to script those out beforehand.  User databases and columns will need to be changed separately, which I'll discuss in more detail later.

You can also perform a fresh SQL Server install on another instance for a side-by-side migration.  One of the advantages of this side-by-side migration technique is that fallback is fast and relatively easy.  The side-by-side migration method is attractive if you plan a server hardware and/or SQL version upgrade anyway.  However, like the REBUILDDATABASE, you will need to create server-level objects after the install. 

Changing User Database Collation

Before I get into the details of a database collation change, please vote on Connect feedback item Make it easy to change collation on a database.  Until such a feature us available, we will endure the pain of performing this task manually.

Assuming you have performed due diligence and remediation beforehand (see my collation change planning article), changing the database collation in-place is relatively easy.  A simple ALTER DATABASE will change the collation of all user database system objects as well as the database default collation:

ALTER DATABASE Foo

COLLATE Latin1_General_CI_AS;

But note that this database collation change does not actually change the collation of existing user table columns.  Columns that do not match the database collation must be changed individually to conform, which is why a mass collation change is such a PITA.  You might choose to rebuild the database using a side-by-side method so that both the database and column collations can be changed during the rebuild process.  I generally recommend such a side--by-side method unless you are constrained by storage space.

Changing Column Collation Using ATLER TABLE...ALTER COLUMN

The syntax for changing a column collation is simple; just execute ALTER TABLE...ALTER COLUMN using the same column definition except for new column collation:

ALTER TABLE dbo.Foo ALTER COLUMN

      Bar varchar(50) COLLATE Latin1_General_CI_AS NOT NULL;

The above DDL method appears simple at first glance but there are many caveats that make this method problematic, especially when it must be repeated for many tables, large databases and/or a code page change is involved.  ALTER TABLE...ALTER COLUMN may be acceptable for a isolated change but not necessarily for a mass one.  The major issues are:

·         Each column must be changed individually

You'll need a separate ALTER COLUMN statement for each character column in the database.  A T-SQL script that generates the needed DDL using the catalog views is a must.  See Louis Davidson's Change table collations en masse article for an example and be aware that text columns are problematic.

·         Column references must be dropped

The altered column cannot be referenced by a constraint, index, statistic, computed column or schemabound object.  This means that all of these references must be dropped before the column is altered and recreated afterward.

·         Data are updated with a code page change

ALTER TABLE...ALTER COLUMN is a always a fast metadata-only change with a Unicode column.  The operation is also a metadata-only change for a non-Unicode column, but only if the old and new collations have the same code page/character set. 

When the old and new collations have a different code page/character set, then every row must be updated when a non-Unicode column is changed.  The performance ramifications of such an update are huge, especially with large tables.  A full table scan is required for each ALTER statement and every row in the table will be updated.  Also, since SQL Server internally drops the old column and adds a new one, the internal row size increases considerably.  Be aware that space requirements for modified non-Unicode columns will more than double until the clustered index is (re)built.  To reclaim the space of a heap, you'll need to create and drop a clustered index.  Keep in mind that the ALTER operation is fully logged regardless of the database recovery model so you need to plan log space requirements accordingly.

Because of these considerations, I do not recommend using ALTER TABLE...ALTER COLUMN for a mass collation change, especially when non-Unicode columns are involved and the code page/character set of the collations are different.  Instead, migrate data to a new table with columns of the desired collation.

Changing Column Collation Using a New Table

If you cannot perform a side-by-side migration of the entire database using a side-by-side method due to storage constraints, an alternative to ALTER TABLE...ALTER COLUMN is to create a new table with the desired collation and then copy data from the original table.  I also recommend this method over ALTER TABLE...ALTER COLOMN when migrating to a different code page/character set for the reasons I previously mentioned.

1.       Change the database recovery model to SIMPLE to minimize log space requirements

2.       Drop all constraints, except clustered primary key and clustered unique constraints

3.       Drop all non-clustered indexes to free up disk space for the migration

4.       For each table:

o   Create a new table exactly like the original, except with a different name and new collation for all character columns

o   Create the clustered index and check constraints

o   Load data

·         Use INSERT...SELECT to load the new table.  Be sure to specify a TABLOCKX hint on the INSERT so that the operation is minimally logged.  If the table has an identity column. be sure to SET IDENTITY_INSERT...ON to retain the existing identity values.

o   Drop the old table after successful copy and rename new table to old name

5.       Create non-clustered indexes, constraints, triggers, object permissions, etc.

Summary

I cannot overstate the importance of choosing the right collation during the initial install since it is difficult to change after the fact.  Unfortunately, we often inherit instances and databases of varying collations and must evaluate the effort of the collation change against the benefits of a consistent collation.  If you are considering a collation change, be sure to test beforehand to avoid surprises during and after the migration and have a solid fallback plan.