Calendar Table and Date/Time Functions

I frequently see questions in the forums and newsgroups about how to best query date/time data and perform date manipulation.  Let me first say that a permanent calendar table that materializes commonly used DATEPART values along with time periods you frequently use is invaluable.  I’ve used such a table for over a decade with great success and strongly recommend you implement one on all of your database servers.  I’ve included a sample calendar table (and numbers table) later in this post and you can find other variations of such a table via an internet search.

Removing the Time Portion

A common requirement I have is to remove the time portion from a date/time value.  This is easy in SQL 2008 since you can simply “CAST(SomeDateTimeValue AS date)”.  But the date data type is not available in older SQL Server versions so you need an alternate method.  In SQL 2005 and earlier versions, I recommend the DATEADD…DATEDIFF method below with an arbitrary base date value specified in a format that is independent of the session DATAFORMAT setting:

SELECT CAST(GETDATE() AS date); --SQL 2008 and later

SELECT DATEADD(day, DATEDIFF(day, '19000101', GETDATE()), '19000101'); --SQL 2005 and earlier

 

I often see a variation of the DATEADD…DATEDIFF technique with the integer zero (no quotes) specified as the base date.  Although this may provide the expected results (I’ve done it myself), I caution against it because it relies on implicit conversion from the internal SQL Server integer date/time storage format.  If you want to be concise, a better approach is to specify an empty string for the base date value since the default value is ‘1900-01-01 00:00:00’.  In my opinion, an explicit data value is more intuitive, though.

SELECT DATEADD(day, DATEDIFF(day, '', GETDATE()), '');

 

I also sometimes see code that extracts the year, month and day date parts and concatenates with separators.  However, that method is dependent on session DATEFORMAT settings and slower than other methods.  See Tibor Karaszi’s The ultimate guide to the datetime datatypes article for details.

First and Last Day of Period

Another common task is to determine the first or last day of a given period.  The script below shows how to accomplish this of you don’t have a calendar table with the calculated values available.

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], DATEADD(year, DATEDIFF(year,'19000101',@Date), '19000101') AS [CalendarDate]

UNION ALL

SELECT 'Last day of year', DATEADD(day,-1,DATEADD(year,0,DATEADD(year,DATEDIFF(year,'19000101',@Date)+1,'19000101')))

UNION ALL

SELECT 'First day of month', DATEADD(month, DATEDIFF(month,'19000101',@Date), '19000101')

UNION ALL

SELECT 'Last day of month', DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'19000101',@Date)+1,'19000101')))

UNION ALL

SELECT 'First day week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1),DATEDIFF(day,'19000101', @Date))

UNION ALL

SELECT 'Last day of week (based on DATEFIRST setting)', DATEADD(day,-(DATEPART(weekday ,@Date)-1)+6,DATEDIFF(day,'19000101', @Date));

 

With a calendar table like the one later in this post:

DECLARE @Date date = GETDATE();

SELECT 'First day of year' [DateDescription], (SELECT FirstDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of year', (SELECT LastDateOfYear FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'First day of month', (SELECT FirstDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of month', (SELECT LastDateOfMonth FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'First day week (based on DATEFIRST setting)', (SELECT FirstDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date)

UNION ALL

SELECT 'Last day of week (based on DATEFIRST setting)', (SELECT LastDateOfWeek FROM dbo.Calendar WHERE CalendarDate = @Date);

 

Calendar and Numbers Table

I think auxiliary calendar and number tables are a must-have on every database server.  These objects allow you to easily perform set-based processing in a number of scenarios.  In fact, the calendar table population script below uses a numbers table to populate the calendar table with several thousand rows in under a second.  This is much more efficient that a WHILE loop.

This calendar table population script also updates the table with most US holidays and adjusts business/non-business days accordingly.  In addition to customizing the script for holidays as observed by your organization, you might add fiscal period start/end dates to facilitate querying based on those cycles.  Also consider creating user-defined functions or stored procedures to encapsulate frequently used code that uses the calendar table.  For example, here is a function that returns the date that is a specified number of business days from the date provided:

CREATE FUNCTION dbo.udf_AddBusinessDays

(@Date date, @BusinessDays int)

RETURNS date

AS

BEGIN

      RETURN (

            SELECT TOP (1) CalendarDate AS BusinessDate

            FROM (SELECT TOP (@BusinessDays) CalendarDate

                  FROM dbo.Calendar

                  WHERE

                        CalendarDate > @Date

                        AND BusinessDay = 1

                  ORDER BY CalendarDate) AS BusinessDays

            ORDER BY CalendarDate DESC

      )

END

GO

Script 1: Example calendar table utility function

--auxiliary number table

CREATE TABLE dbo.Numbers(

      Number int NOT NULL

            CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED

      );

     

--load Numbers table with 1,000,000 numbers

WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)

      ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)

      ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)

      ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)

      INSERT INTO dbo.Numbers WITH (TABLOCKX) (

            Number

            )

            SELECT number

            FROM numbers

                                                                                                                                                                                                                        WHERE number < 1000000;                                                                                    

Script 2: Create and populate numbers table

CREATE TABLE dbo.Calendar(

      CalendarDate date NOT NULL

            CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED

      ,CalendarYear int NOT NULL

      ,CalendarMonth int NOT NULL

      ,CalendarDay int NOT NULL

      ,DayOfWeekName varchar(10) NOT NULL

      ,FirstDateOfWeek date NOT NULL

      ,LastDateOfWeek date NOT NULL

      ,FirstDateOfMonth date NOT NULL

      ,LastDateOfMonth date NOT NULL

      ,FirstDateOfQuarter date NOT NULL

      ,LastDateOfQuarter date NOT NULL

      ,FirstDateOfYear date NOT NULL

      ,LastDateOfYear date NOT NULL

      ,BusinessDay bit NOT NULL

      ,NonBusinessDay bit NOT NULL

      ,Weekend bit NOT NULL

      ,Holiday bit NOT NULL

      ,Weekday bit NOT NULL

      ,CalendarDateDescription varchar(50) NULL

);

GO

 

--load dates 2000-01-01 through 2025-12-31

WITH t1 AS (SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)

      ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)

      ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b)

      ,numbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3)

INSERT INTO dbo.Calendar WITH (TABLOCKX) (

      CalendarDate

      ,CalendarYear

      ,CalendarMonth

      ,CalendarDay

      ,DayOfWeekName

      ,FirstDateOfWeek

      ,LastDateOfWeek

      ,FirstDateOfMonth

      ,LastDateOfMonth

      ,FirstDateOfQuarter

      ,LastDateOfQuarter

      ,FirstDateOfYear

      ,LastDateOfYear

      ,BusinessDay

      ,NonBusinessDay

      ,Weekend

      ,Holiday

      ,Weekday

      ,CalendarDateDescription

      )

SELECT

      CalendarDate = DATEADD(day, number, '20000101')

      ,CalendarYear = DATEPART(year, DATEADD(day, number, '20000101'))

      ,CalendarMonth = DATEPART(month, DATEADD(day, number, '20000101'))

      ,CalendarDay = DATEPART(day, DATEADD(day, number, '20000101'))

      ,DayOfWeekName = DATENAME(weekday, DATEADD(day, number, '20000101'))

      ,FirstDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1),DATEADD(day, number, '20000101'))

      ,LastDateOfWeek = DATEADD(day,-(DATEPART(weekday ,DATEADD(day, number, '20000101'))-1)+6,DATEADD(day, number, '20000101'))

      ,FirstDateOfMonth = DATEADD(month, DATEDIFF(month,'20000101',DATEADD(day, number, '20000101')), '20000101')

      ,LastDateOfMonth = DATEADD(day,-1,DATEADD(month,0,DATEADD(month,DATEDIFF(month,'20000101',DATEADD(day, number, '20000101'))+1,'20000101')))

      ,FirstDateOfQuarter = DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101')), '20000101')

      ,LastDateOfQuarter = DATEADD(day, -1, DATEADD(quarter, DATEDIFF(quarter,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))

      ,FirstDateOfYear = DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101')), '20000101')

      ,LastDateOfYear = DATEADD(day,-1,DATEADD(year, DATEDIFF(year,'20000101',DATEADD(day, number, '20000101'))+1, '20000101'))

      --initially set all weekdays as business days

      ,BusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END

      --initially set only weekends as non-business days

      ,NonBusinessDay = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END

      ,Weekend = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Saturday','Sunday') THEN 1 ELSE 0 END

      ,Holiday = 0 --initially no holidays

      ,Weekday = CASE WHEN DATENAME(weekday, DATEADD(day, number, '20000101')) IN('Monday','Tuesday','Wednesday','Thursday','Friday') THEN 1 ELSE 0 END

      ,CalendarDateDescription = NULL

FROM numbers

WHERE number < 9497;

 

--New Year's Day

UPDATE dbo.calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'New Year''s Day'

WHERE

    CalendarMonth = 1

    AND CalendarDay = 1;

 

--New Year's Day celebrated on Friday, December 31 when January 1 falls on Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'New Year''s Day Celebrated'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 31

    AND DayOfWeekName = 'Friday';

   

--New Year's Day celebrated on Monday, January 2 when January 1 falls on Sunday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'New Year''s Day Celebrated'

WHERE

    CalendarMonth = 1

    AND CalendarDay = 2

    AND DayOfWeekName = 'Monday';   

 

--Martin Luther King Day - 3rd Monday in January

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Martin Luther King Day'

WHERE

    CalendarMonth = 1

    AND DayOfWeekName = 'Monday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Monday'

        ) = 3;

 

--President's Day - 3rd Monday in February

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'President''s Day'

WHERE

    CalendarMonth = 2

    AND DayOfWeekName = 'Monday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Monday'

        ) = 3;

       

--Easter - first Sunday after the full moon following the vernal (March 21) equinox

UPDATE dbo.Calendar

SET

      Holiday = 1

    ,CalendarDateDescription = 'Easter'

WHERE

    CalendarDate IN(

        '20000423'

        ,'20010415'

        ,'20020331'

        ,'20030420'

        ,'20040411'

        ,'20050327'

        ,'20060416'

        ,'20070408'

        ,'20080323'

        ,'20090412'

        ,'20100404'

        ,'20110424'

        ,'20120408'

        ,'20130331'

        ,'20140420'

        ,'20150405'

        ,'20160427'

        ,'20170416'

        ,'20180401'

        ,'20190421'

        ,'20200412'

        ,'20210404'

        ,'20220417'

        ,'20230409'

        ,'20240331'

        ,'20250420'

    );

 

--Good Friday - 2 days before Easter Sunday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Good Friday'

WHERE

    CalendarDate IN(

        SELECT DATEADD(day, -2, c2.CalendarDate)

        FROM dbo.Calendar c2

        WHERE c2.CalendarDateDescription = 'Easter'

        );

 

--Memorial Day - last Monday in May

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Memorial Day'

WHERE

    CalendarMonth = 5

    AND DayOfWeekName = 'Monday'

    AND CalendarDate IN(

        SELECT MAX(c2.CalendarDate)

        FROM dbo.Calendar c2

        WHERE

            c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = 5

            AND c2.DayOfWeekName = 'Monday'

        );

 

--Independence Day - July 4th

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Independence Day'

WHERE

    CalendarMonth = 7

    AND CalendarDay = 4;

 

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Independence Day Celebrated'

WHERE

    CalendarMonth = 7

    AND CalendarDay = 3

    AND DayOfWeekName = 'Friday';

 

--Independence Day celebrated on Friday, July 3 when July 4 falls on a Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Independence Day Celebrated'

WHERE

    CalendarMonth = 7

    AND CalendarDay = 5

    AND DayOfWeekName = 'Monday';

       

--Labor Day - first Monday in September

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Labor Day'

WHERE

    CalendarMonth = 9

    AND DayOfWeekName = 'Monday'

    AND CalendarDate IN(

        SELECT MIN(c2.CalendarDate)

        FROM dbo.Calendar c2

        WHERE

            c2.CalendarYear = calendar.CalendarYear

            AND c2.CalendarMonth = 9

            AND c2.DayOfWeekName = 'Monday'

        );

 

--Columbus Day - second Monday in October

UPDATE dbo.Calendar

SET

      Holiday = 1

    ,CalendarDateDescription = 'Columbus Day'

WHERE

    CalendarMonth = 10

    AND DayOfWeekName = 'Monday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Monday'

        ) = 2;

 

--Veteran's Day - November 11

UPDATE dbo.Calendar

SET

      Holiday = 1

    ,CalendarDateDescription = 'Veteran''s Day'

WHERE

    CalendarMonth = 11

    AND CalendarDay = 11;

 

--Thanksgiving - fourth Thursday in November

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Thanksgiving'

WHERE

    CalendarMonth = 11

    AND DayOfWeekName = 'Thursday'

 

    AND (SELECT COUNT(*) FROM

            dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Thursday'

        ) = 4;

       

--Day after Thanksgiving - fourth Friday in November

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Day after Thanksgiving'

WHERE

    CalendarMonth = 11

    AND DayOfWeekName = 'Friday'

    AND (SELECT COUNT(*)

            FROM dbo.Calendar c2

        WHERE

            c2.CalendarDate <= Calendar.CalendarDate

            AND c2.CalendarYear = Calendar.CalendarYear

            AND c2.CalendarMonth = Calendar.CalendarMonth

            AND c2.DayOfWeekName = 'Friday'

        ) = 4;       

       

--Christmas Day - December 25th

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

      ,Holiday = 1

    ,CalendarDateDescription = 'Christmas Day'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 25;

 

--Christmas day celebrated on Friday, December 24 when December 25 falls on a Saturday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Christmas Day Celebrated'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 24

    AND DayOfWeekName = 'Friday';

 

--Christmas day celebrated on Monday, December 24 when December 25 falls on a Sunday

UPDATE dbo.Calendar

SET

      BusinessDay = 0

      ,NonBusinessDay = 1

    ,CalendarDateDescription = 'Christmas Day Celebrated'

WHERE

    CalendarMonth = 12

    AND CalendarDay = 26

    AND DayOfWeekName = 'Monday';

Script 3: Create and populate calendar table and update with holidays

 

Secret of SQL Trace Duration Column

Why would a trace of long-running queries not show all queries that exceeded the specified duration filter?  We have a server-side SQL Trace that includes RPC:Completed and SQL:BatchCompleted events with a filter on Duration >= 100000.  Nearly all of the queries on this busy OLTP server run in under this 100 millisecond threshold so any that appear in the trace are candidates for root cause analysis and/or performance tuning opportunities.

After an application experienced query timeouts, the DBA looked at the trace data to corroborate the problem.  Surprisingly, he found no long-running queries in the trace from the application that experienced the timeouts even though the application’s error log clearly showed detail of the problem (query text, duration, start time, etc.).  The trace did show, however, that there were hundreds of other long-running queries from different applications during the problem timeframe.  We later determined those queries were blocked by a large UPDATE query against a critical table that was inadvertently run during this busy period.

So why didn’t the trace include all of the long-running queries?  The reason is because the SQL Trace event duration doesn’t include the time a request was queued while awaiting a worker thread.  Remember that the server was under considerable stress at the time due to the severe blocking episode.  Most of the worker threads were in use by blocked queries and new requests were queued awaiting a worker to free up (a DMV query on the DAC connection will show this queuing: “SELECT scheduler_id, work_queue_count FROM sys.dm_os_schedulers;”).  Technically, those queued requests had not started.  As worker threads became available, queries were dequeued and completed quickly.  These weren’t included in the trace because the duration was under the 100ms duration filter.  The duration reflected the time it took to actually run the query but didn’t include the time queued waiting for a worker thread.

The important point here is that duration is not end-to-end response time.  Duration of RPC:Completed and SQL:BatchCompleted events doesn’t include time before a worker thread is assigned nor does it include the time required to return the last result buffer to the client.  In other words, duration only includes time after the worker thread is assigned until the last buffer is filled.  But be aware that duration does include the time need to return intermediate result set buffers back to the client, which is a factor when large query results are returned.  Clients that are slow in consuming results sets can increase the duration value reported by the trace “completed” events.

Ad-Hoc Rollup by date/time Interval

I often use aggregate queries to rollup data by an arbitrary date/time interval.  I'll share some techniques that I use to accomplish the task in case you find these useful, using the same table below:

CREATE TABLE dbo.WebStats

(

      RequestTimestamp datetime NOT NULL,

      Page varchar(255) NOT NULL

);

CREATE CLUSTERED INDEX WebStats_cdx ON dbo.WebStats(RequestTimestamp, Page);

 

INSERT INTO dbo.WebStats (RequestTimestamp, Page)

VALUES

      ('2010-01-01T00:00:00', 'Default.aspx')

      ,('2010-01-01T00:00:15', 'Default.aspx')

      ,('2010-01-01T00:01:05', 'Order.aspx')

      ,('2010-01-01T00:01:30', 'Default.aspx')

      ,('2010-01-01T00:01:40', 'OrderStatus.aspx')

      ,('2010-01-01T00:02:05', 'Default.aspx')

      ,('2010-01-01T00:03:05', 'ProductInfo.aspx')

      ,('2010-01-01T00:03:30', 'Default.aspx');

GO

 

Simple Rollup

Without an auxiliary table, a little DATEADD magic can do the trick.  Here's an example that summarizes web page requests by minute for the specified date/time range:

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-02T00:00:00';

 

SELECT

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp) AS Interval,

      COUNT(*) AS PageRequests

FROM dbo.WebStats

GROUP BY

      DATEADD(minute, DATEDIFF(minute, @StartTimestamp, RequestTimestamp), @StartTimestamp)

ORDER BY

      Interval; 

 

Results:

Interval

PageRequests

2010-01-01 00:00:00.000

2

2010-01-01 00:01:00.000

3

2010-01-01 00:02:00.000

1

2010-01-01 00:03:00.000

2

2010-01-01 00:29:00.000

1

2010-01-01 00:31:00.000

1

2010-01-01 00:42:00.000

1

2010-01-01 02:01:00.000

2

2010-01-01 02:03:00.000

2

2010-01-01 02:31:00.000

1

2010-01-01 02:44:00.000

1

2010-01-01 02:49:00.000

1

 

Arbitrary Intervals

The simple rollup method works well for any of the pre-defined units provided by the DATEADD function (year, quarter, month, day, hour, minute, second or week).  However, it lacks the flexibility to roll up to an arbitrary interval like 15 minutes or 30 seconds.  A little DATEADD/DATEDIFF math addresses this gap.  Below is an example of a 30-minute interval rollup using this technique:

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-01T04:00:00'

      ,@IntervalSeconds int = 1800; --30 minutes

SELECT

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp)

            / @IntervalSeconds * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(*) AS PageRequests

FROM dbo.WebStats

WHERE

      RequestTimestamp >= @StartTimestamp

      AND RequestTimestamp < @EndTimestamp

GROUP BY

      DATEADD(second

            ,DATEDIFF(second, @StartTimestamp

            ,RequestTimestamp) / @IntervalSeconds * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

 

Missing Intervals

You probably noticed that periods with no activity at all are omitted rather than reporting a zero value.  One method to include the missing intervals is with an outer join to a temporal table containing all the desired intervals.  Ideally, the temporal table would be a permanent one but I've found it impractical to maintain such a table for ad-hoc needs.  Fortunately, a utility numbers CTE is a handy way to generate the needed intervals dynamically.  The example below provides up to 65,536 interval values and can be easily extended as needed.

DECLARE

      @StartTimestamp datetime = '2010-01-01T00:00:00'

      ,@EndTimestamp datetime = '2010-01-01T04:00:00'

      ,@IntervalSeconds int = 1800; --30 minutes

 

WITH

      T2 AS (SELECT 0 AS Num UNION ALL SELECT 0),

      T4 AS (SELECT 0 AS Num FROM T2 AS A CROSS JOIN T2 AS B),

      T256 AS (SELECT 0 AS Num FROM T4 AS A CROSS JOIN T4 AS B CROSS JOIN T4 AS C CROSS JOIN T4 AS D),

      T65536 AS (SELECT ROW_NUMBER() OVER(ORDER BY A.Num) AS Num FROM T256 AS A CROSS JOIN T256 AS B)

SELECT

      DATEADD(second

            ,(Num-1) * @IntervalSeconds, @StartTimestamp) AS Interval

      ,COUNT(WebStats.RequestTimestamp) AS PageRequests

FROM T65536

LEFT JOIN dbo.WebStats ON

      WebStats.RequestTimestamp >= DATEADD(second, (Num-1) * @IntervalSeconds, @StartTimestamp)

      AND WebStats.RequestTimestamp < DATEADD(second, Num * @IntervalSeconds, @StartTimestamp)

WHERE

      Num <= DATEDIFF(second, @StartTimeStamp, @EndTimestamp) / @IntervalSeconds

GROUP BY

      DATEADD(second

            ,(Num-1) * @IntervalSeconds, @StartTimestamp)

ORDER BY

      Interval;  

 

Interval

PageRequests

2010-01-01 00:00:00.000

9

2010-01-01 00:30:00.000

2

2010-01-01 01:00:00.000

0

2010-01-01 01:30:00.000

0

2010-01-01 02:00:00.000

4

2010-01-01 02:30:00.000

3

2010-01-01 03:00:00.000

0

2010-01-01 03:30:00.000

0

 

Collation Hell (Part 3)

In this final post of my Collation Hell series, I'll discuss techniques to change a SQL Server instance collation along with the collation of all databases and columns.  The objective is to ensure the standard collation is used throughout the entire SQL Server instance.  See part 1 and part 2 of this series for more information on selecting a standard collation and planning such a collation change.

Be aware that a complete collation change is not unlike that of a major version upgrade, except tools to facilitate the change are limitted.  You'll need to build new system databases, change user databases and change every character column to conform to the new collation.  These collation changes can be done using either a side-by-side migration technique or performed in-place.

Changing the Instance Collation

The SQL Server setup REBUILDDATABASE option (see Books Online) is used to create new system databases for an existing instance with the desired collation.  One advantage of using REBUILDDATABASE over a complete reinstall is that post-RTM service packs and patches don't need to be reapplied afterward.  However, all server level objects like logins, linked servers, jobs, etc. need to be recreated after the rebuild so you'll need to script those out beforehand.  User databases and columns will need to be changed separately, which I'll discuss in more detail later.

You can also perform a fresh SQL Server install on another instance for a side-by-side migration.  One of the advantages of this side-by-side migration technique is that fallback is fast and relatively easy.  The side-by-side migration method is attractive if you plan a server hardware and/or SQL version upgrade anyway.  However, like the REBUILDDATABASE, you will need to create server-level objects after the install. 

Changing User Database Collation

Before I get into the details of a database collation change, please vote on Connect feedback item Make it easy to change collation on a database.  Until such a feature us available, we will endure the pain of performing this task manually.

Assuming you have performed due diligence and remediation beforehand (see my collation change planning article), changing the database collation in-place is relatively easy.  A simple ALTER DATABASE will change the collation of all user database system objects as well as the database default collation:

ALTER DATABASE Foo

COLLATE Latin1_General_CI_AS;

But note that this database collation change does not actually change the collation of existing user table columns.  Columns that do not match the database collation must be changed individually to conform, which is why a mass collation change is such a PITA.  You might choose to rebuild the database using a side-by-side method so that both the database and column collations can be changed during the rebuild process.  I generally recommend such a side--by-side method unless you are constrained by storage space.

Changing Column Collation Using ATLER TABLE...ALTER COLUMN

The syntax for changing a column collation is simple; just execute ALTER TABLE...ALTER COLUMN using the same column definition except for new column collation:

ALTER TABLE dbo.Foo ALTER COLUMN

      Bar varchar(50) COLLATE Latin1_General_CI_AS NOT NULL;

The above DDL method appears simple at first glance but there are many caveats that make this method problematic, especially when it must be repeated for many tables, large databases and/or a code page change is involved.  ALTER TABLE...ALTER COLUMN may be acceptable for a isolated change but not necessarily for a mass one.  The major issues are:

·         Each column must be changed individually

You'll need a separate ALTER COLUMN statement for each character column in the database.  A T-SQL script that generates the needed DDL using the catalog views is a must.  See Louis Davidson's Change table collations en masse article for an example and be aware that text columns are problematic.

·         Column references must be dropped

The altered column cannot be referenced by a constraint, index, statistic, computed column or schemabound object.  This means that all of these references must be dropped before the column is altered and recreated afterward.

·         Data are updated with a code page change

ALTER TABLE...ALTER COLUMN is a always a fast metadata-only change with a Unicode column.  The operation is also a metadata-only change for a non-Unicode column, but only if the old and new collations have the same code page/character set. 

When the old and new collations have a different code page/character set, then every row must be updated when a non-Unicode column is changed.  The performance ramifications of such an update are huge, especially with large tables.  A full table scan is required for each ALTER statement and every row in the table will be updated.  Also, since SQL Server internally drops the old column and adds a new one, the internal row size increases considerably.  Be aware that space requirements for modified non-Unicode columns will more than double until the clustered index is (re)built.  To reclaim the space of a heap, you'll need to create and drop a clustered index.  Keep in mind that the ALTER operation is fully logged regardless of the database recovery model so you need to plan log space requirements accordingly.

Because of these considerations, I do not recommend using ALTER TABLE...ALTER COLUMN for a mass collation change, especially when non-Unicode columns are involved and the code page/character set of the collations are different.  Instead, migrate data to a new table with columns of the desired collation.

Changing Column Collation Using a New Table

If you cannot perform a side-by-side migration of the entire database using a side-by-side method due to storage constraints, an alternative to ALTER TABLE...ALTER COLUMN is to create a new table with the desired collation and then copy data from the original table.  I also recommend this method over ALTER TABLE...ALTER COLOMN when migrating to a different code page/character set for the reasons I previously mentioned.

1.       Change the database recovery model to SIMPLE to minimize log space requirements

2.       Drop all constraints, except clustered primary key and clustered unique constraints

3.       Drop all non-clustered indexes to free up disk space for the migration

4.       For each table:

o   Create a new table exactly like the original, except with a different name and new collation for all character columns

o   Create the clustered index and check constraints

o   Load data

·         Use INSERT...SELECT to load the new table.  Be sure to specify a TABLOCKX hint on the INSERT so that the operation is minimally logged.  If the table has an identity column. be sure to SET IDENTITY_INSERT...ON to retain the existing identity values.

o   Drop the old table after successful copy and rename new table to old name

5.       Create non-clustered indexes, constraints, triggers, object permissions, etc.

Summary

I cannot overstate the importance of choosing the right collation during the initial install since it is difficult to change after the fact.  Unfortunately, we often inherit instances and databases of varying collations and must evaluate the effort of the collation change against the benefits of a consistent collation.  If you are considering a collation change, be sure to test beforehand to avoid surprises during and after the migration and have a solid fallback plan.

 

Collation Hell (Part 2)

In my last post, I discussed why one should avoid a mixed collation environment and how to choose the right collation for your environment.  This post focuses on planning a collation change.

Should You Change Existing Collations?

Once you choose a standard collation (or at least a preferred one) for your organization, you'll need to decide if the change to existing instances, databases and columns is worth the effort and risk.  Keep in mind that the effort involves not only the actual collation change but also testing along with possible changes to code and data to maintain the desired behavior.  Such a remediation project can be quite significant depending on the old/new collation and scope of the change so you need to weigh the pros and cons to determine if the effort is justified.

Note that changing collations need not be an all-or-none decision; you might choose to convert only some (or none) of your existing instances/databases while enforcing the collation standard for new installations.  You can identify the instances that are causing the most grief and weigh those accordingly.

A number of factors influence the effort and risk of a collation change.  A change to language, sensitivity and/or code page is often more complex than a conversion from a SQL collation to a Windows collation (or Windows to SQL) of the same language and sensitivity.  Let me discuss these scenarios in more detail so that you can better ascertain the effort and risk involved in your environment for planning purposes.

Windows vs. SQL Collation Change

A conversion between a SQL and Windows collation of the same language, sensitivity and code page ought to be fairly straightforward due to the same character set and similar comparison rules.  As with any collation change, there are differences in behavior though.  The main difference here is that Windows collations use word sort behavior so slightly different sorting/comparison behavior will result.  The script below shows such a difference with identical data

--SQL collation: compares greater than

IF 'coop' COLLATE SQL_Latin1_General_CP1_CI_AS < 'co-op' COLLATE SQL_Latin1_General_CP1_CI_AS

      PRINT 'less than'

ELSE IF 'coop' COLLATE SQL_Latin1_General_CP1_CI_AS = 'co-op' COLLATE SQL_Latin1_General_CP1_CI_AS

      PRINT 'equal'

ELSE IF 'coop' COLLATE SQL_Latin1_General_CP1_CI_AS > 'co-op' COLLATE SQL_Latin1_General_CP1_CI_AS

      PRINT 'greater than'

ELSE PRINT 'UNKNOWN'

 

--Windows collation: compares less than

IF 'coop' COLLATE Latin1_General_CI_AS < 'co-op' COLLATE Latin1_General_CI_AS

      PRINT 'less than'

ELSE IF 'coop' COLLATE Latin1_General_CI_AS = 'co-op' COLLATE Latin1_General_CI_AS

      PRINT 'equal'

ELSE IF 'coop' COLLATE Latin1_General_CI_AS > 'co-op' COLLATE Latin1_General_CI_AS

      PRINT 'greater than'

ELSE PRINT 'UNKNOWN'

 

All things being equal, a conversion from/to a Windows collation will likely require few changes, if any, to code and schema (besides the collation change).  On the other hand, converting to a collation of different sensitivity and/or character set is often be more challenging

Sensitivity Change

You might recall that the instance collation determines the sensitivity for variable names and labels while the database collation determines sensitivity of identifiers and literals.  I always match characters exactly in variable names, labels identifiers (including table aliases) regardless of whether I'm using a sensitive or insensitive collation and never use names that differ only by case.  Not only does naming consistency make code cleaner, this practice facilitates moving between collations.  However, it is unlikely that all database developers were so anal in their naming so be aware that you'll probably need to make code or schema changes in order to convert between collations of different sensitivity.

A change from a case-sensitive collation to a case-insensitive one is usually minor, at least from a code perspective.   The same schema/code that runs in a case-sensitive environment will run in a case-insensitive collation as long as you don't encounter names and identifiers in the same scope that differ only by case (e.g. @customerID and @CustomerID).  Such a deliberate practice is uncommon in my experience but these conflicts must be addressed before changing to a case-insensitive collation.

One usually strives to store and query data using a consistent case (especially all upper/lower) under a case-sensitive collation.  If this practice was not followed, data that was unique under a case-sensitive collation will not be regarded as such under case-insensitive rules and prevent unique indexes (including primary key or unique constraints) from begin created.  This might actually be a good thing when the real issue is bad data (i.e. duplicates inadvertently allowed due to inconsistent case).  However, you may need to deviate from the case-insensitive standard at the column level in some situations due to business requirements, such as to enforce uniqueness of case-sensitive part numbers.

Going from a case-insensitive to a case-sensitive or binary collation (which I don't personally recommend) will typically require more changes.  Developers tend to be a bit sloppy with matching case under a case-insensitive collation because there is no requirement to do so.  Don't be surprised if a lot of code and queries must be changed once variables and identifiers become case sensitive.  Furthermore, you may need to update data to a consistent case and also make application changes to ensure data are stored in a consistent case.

The considerations that apply to case sensitivity also apply to other collation sensitivity options (accent, Kana and width).  I wouldn't expect as many issues compared to a change in case sensitivity in most cases, though.

Character Set Change

A change in code page is a non-issue when char/varchar/text data contains only ASCII characters.  If you have a character outside the ASCII range (0-127, 0x00-0x7F), a code page change will present a problem when the character doesn't also exist in the target collation's code page.  Such a character will instead be mapped to an alternate character (e.g. 'À' to 'A' in example below) or the catch-all '?' (e.g. '€' to '?' in example below).  If this mapping is unacceptable, you'll need to change the data type to Unicode (nchar/nvarchar/ntext) or update data to conform to the target code page.

CREATE TABLE dbo.Foo(

      Bar char(1) COLLATE Latin1_General_CI_AS

      );

INSERT INTO dbo.Foo (Bar) VALUES('A');

INSERT INTO dbo.Foo (Bar) VALUES('À');

INSERT INTO dbo.Foo (Bar) VALUES('€');

 

--list values not mapped identically

SELECT Bar AS OriginalValue, Bar COLLATE Japanese_90_BIN AS MappedValue

FROM dbo.Foo

WHERE

      CAST(CAST(Bar AS nvarchar(MAX)) AS varbinary(MAX)) <>

      CAST(CAST(Bar COLLATE Japanese_90_BIN AS nvarchar(MAX)) AS varbinary(MAX));

OriginalValue

MappedValue

À

A

?

 

If you are unsure if you have problem characters, the above script shows one method to identify these.  This script converts the original collation characters to Unicode and then to varbinary and repeats the technique for the target collation.  An inequality of the two values indicates an inexact mapping that may require remediation.

Language Change

I'm sure some of you have inherited different language collations due to mergers and acquisitions or inattention to detail during installation.  Be mindful that the topic of supporting multiple languages/locales is much larger than just collation.  I'm only discussing a collation language change here but if you need to fully support multiple languages in a single database, you must also consider other factors such as a schema that supports multiple translations, currency and UOM conversion and applications that are sensitive to client locale.

You may experience different behavior after a collation language change due to the different sorting and comparison semantics.  The script below illustrates such a difference.  Even if you chose a collation that supports the majority of your users' languages, that collation might be less than ideal for the user minority.  Consider performing some operations in application code instead of SQL Server when the standard collation behavior is unacceptable for the task at hand.

--returns both 'Schröder' and 'Schroeder'

DECLARE @Foo TABLE(

      LastName nvarchar(10) COLLATE German_PhoneBook_CI_AS);

INSERT INTO @Foo VALUES(N'Schröder');

INSERT INTO @Foo VALUES(N'Schroeder');

SELECT LastName FROM @Foo

WHERE LastName LIKE N'%oe%';

GO

--returns only 'Schroeder'

DECLARE @Foo TABLE(

      LastName nvarchar(10) COLLATE Latin1_General_CI_AS);

INSERT INTO @Foo VALUES(N'Schröder');

INSERT INTO @Foo VALUES(N'Schroeder');

SELECT LastName FROM @Foo

WHERE LastName LIKE N'%oe%';

GO

 

Summary

A collation change effort varies considerably depending on the size and complexity of the environment.  Perform due diligence before embarking on a collation change.  I don't want to discourage anyone from changing collations but as much as a mixed collation environment is a pain, a botched remediation project is even worse.  Be sure to plan accordingly.

I'll share different methods to change collations in my last post of this series.

Collation Hell (Part 1)

I inherited a mixed collation environment with more collations than I can count on one hand.  The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions.  Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.  Here's a brief overview of collations and some guidance to help you choose the right collation for your organization and new SQL installations.

Collation Overview

A collation determines the rules SQL Server uses to compare and sort character data.  These rules are language/locale aware and may also be sensitive to case, accent, Kana and width.  Collation suffixes identify dictionary rule (in)sensitivity:  _CS (case sensitive), _CI (case insensitive), _AS (accent sensitive), _AI (accent insensitive) and _KS (Kana sensitive).   Binary collations, identified by suffixes _BIN (binary) and _BIN2 (binary-code point), are sensitive in all regards.

A collation determines which characters can be stored in non-Unicode character data types and the bit patterns used for storage.  Char, varchar and text data types can store only 256 different characters due to the single byte limitation.  The first 128 characters (0-127, 0x00-0x7F) are the same for all collations as defined by the ASCII character set and the remaining 128 characters (128-255, 0x80-0xFF) vary according to the code page associated with the collation.  Characters without an associated code point are mapped to an either an alternate character or to the catch-all '?' character.

Collations are grouped into Windows and SQL collations.  Windows collations provide sorting and comparison behavior consistent with applications running on a computer with the corresponding Windows operating system locale.  Windows collation also provide consistent behavior for both Unicode and non-Unicode data types. 

SQL collations use different rules for non-Unicode and Unicode types.  SQL Server collations, identified with the SQL_ collation name prefix, use the character set and sort order settings from older SQL Server versions for non-Unicode types and are provided specifically to maintain compatibility with existing SQL Server installations.  Both SQL and Windows collations use the same rules for Unicode types.

Specifying a Collation

Collation can be specified at the instance, database, column and expression level.  The SQL Server instance collation is determined during SQL Server installation and cannot be changed without a reinstall/rebuild.  It's a good idea to get the collation right the first time unless you need practice re-installing SQL Server.  Keep in mind that the instance collation determines the collation (including case-sensitivity) of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables.  Passwords are always case-sensitive in SQL 2005 and above, although collation determined password sensitivity in earlier versions.

The database collation is determined when the database is created.  If not specified otherwise, the instance default collation is used as the database collation.  Database-level identifiers like table and column names use the database collation as do literal expressions.  The database collation can be changed at any time but this does not change the collation of existing table columns.

Column collation for character data is specified when the table is created or when the column added to the table.  If not specified otherwise, the database collation is used.  A column's collation can be changed only by altering the column with the new collation or recreating the table with the new collation specified on the column definition.  If you want a column's collation to remain different than the database default collation, you must be careful to explicitly specify the collation whenever the column is altered so that it not inadvertently changed to the database default collation.

Choosing the Right Collation

The default collation that the SQL Server installer chooses is not necessarily the Microsoft recommended one or the one that is best for your environment.  SQL Server setup examines the operating system locale and chooses the default as the oldest available version associated with the locale.  For example, a SQL Server installation in the US will default to SQL_Latin1_General_CP1_CI_AS and the installation default in the UK will be Latin1_General_CI_AS.  In both cases, Microsoft recommends a Windows collation (e.g. Windows Latin1_General_CI_AS) unless one needs to maintain compatibility with existing installations.  More on that shortly.

Language is the most important consideration in choosing a collation for a new installation.  This is one reason why the SQL Server installer chooses the default collation based on the operating system locale.  If all users speak the same language, choose a collation that supports the language/locale.  This will help ensure expected sorting and comparison behavior along with alphabet support for non-Unicode types.  In a multi-language environment, choose a collation with the best overall support for the languages used.

Another major consideration is collation compatibility.  If you have existing SQL installations, consider using the same collation for a new instance if you envision sharing data via replication, SSIS or future server consolidation.  I previously mentioned that Microsoft recommends a Windows collation but it may be better to revert to a SQL collation for compatibility with older instances in your environment that already use the SQL collation.  Compatibility is another reason why the installation default is SQL_Latin1_General_CP1_CI_AS collation in the US.  Unfortunately, this default has the side effect of DBAs unwittingly installing new instances with a SQL collation instead of a Windows collation like Latin1_General_CI_AS even when compatibility isn't needed.

The choice of whether or not to choose a case sensitive collation is a bit subjective.  A case insensitive collation is appropriate when you need to query data regardless of the case of the actual data.  For example, this allows one to easily find customers with a last name of 'Smith' even when data is not stored in proper case.  With a case sensitive collation, it is important that one stores data in a consistent case (not to say that one shouldn't anyway) and this places more burden on application and database developers. 

Collation Performance

Collation performance was a bigger deal back in the days of 486 processors (instead of collation, it was actually character set and sort order back then).  The comparative performance on modern processors is usually insignificant.  SQL collations should provide better performance than Windows collations for non-Unicode types due to simpler comparison rules but the difference is significant only in the most severe circumstances, such as a table scan with LIKE '%Some String%' in the WHERE clause.  See Comparing SQL collations to Windows collations.  Binary collations are said to provide the best performance but the cost of unnatural (non-dictionary) comparisons and sort order is high; most users would expect 'a' to sort before 'B' but that is not the case with binary collations.

I personally don't think performance should even be considered in choosing the proper collation.  One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems.  With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations.  The real key to performance is query and index tuning rather than collation.  If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.

Summary

My general recommendation is that you should use a case insensitive Windows collation appropriate for your locale unless you need to maintain compatibility with existing SQL instances or have special considerations.  In my next post, I'll discuss changing collations so that you can avoid a mixed collation environment and show different methods to accomplish the task.

Forced Parameterization: A Turbo Button?

I never had the need to turn on the PARAMETERIZATION FORCED database option until this week.  We pretty much use only stored procedures for our internal applications so the execution plans are almost always in cache and reused.  This practice of using parameterized stored procedure calls, together with attention to detail in query and index tuning, allows us to comfortably handle several thousand requests per second on commodity hardware without taking special measures.

The Perfect Storm

We acquired a third-party application which had to sustain thousands of batch requests per second in order to keep up with our peak demand.  Our first attempt to use the application out-of-the box failed miserably when the 16-core database server quickly hit 100% CPU and stayed there.  An examination of the most frequently run query soon revealed why CPU was so high.  Not only was the moderately complex query not parameterized, each invocation required a full table scan.  The schema (EAV model, missing primary keys and indexes), application code (ad-hoc, non-parameterized queries) and inattention to indexing seemed the perfect storm to guarantee failure. 

Our hands were tied in what the vendor could/would do to address our performance concerns.  We worked with the vendor to optimize indexes and this brought the CPU down to about 65% but the batch requests/sec rate and slow response time was still unacceptable.   We needed to increase performance by at least an order of magnitude to meet SLAs.

The Perfect Fix

I then recalled an experience that SQL Server MVP Adam Machanic shared not long ago:

CPU was 95%+ at peak time (several thousand batch requests/second, via an ASP (classic) front end), and the peak time lasted 8+ hours every day.  The server was one of the big HP boxes -- not sure if it was a Superdome or some other model -- with something like 56 cores and 384 GB of RAM.  The database itself was only 40 or 50 GB, as I recall, so the entire thing was cached.  Long story short, I logged in during peak load, did a quick trace and noticed right away that none of the queries were parameterized.  I decided to throw caution to the wind and just go for it.  Flipped the thing into Forced Parameterization mode and held my breath as I watched the CPU counters *instantly* drop to 7% and stay there. I thought I'd broken the thing, but after checking my trace queries were running through the system same as before, and with the same number of errors (another story entirely <g>). Luckily the head IT guy happened to be watching his dashboard right as I made the change, and after seeing such an extreme result thought I was a god...

 

I knew of PARAMETERIZATION FORCED but never realized how big a difference the option could make until I learned of Adam's experience.  I'm not quite as adventuresome as he is so I restored the production database to a separate environment for some cursory testing.  To my amazement, I watched the rate of my single-threaded test jump from a few dozen batch requests/sec to several hundred immediately after I executed "ALTER DATABASE...SET PARAMETERIZATION FORCED".  CPU dropped by half even with the tenfold increase in throughput. 

The production improvement was even more impressive - the 16 core Dell R900 hasn't exceeded 8% CPU since the change.  Response time is excellent, we have happy users and plenty of CPU headroom to spare.

A Turbo Button?

Despite anecdotal success with PARAMETERIZATION FORCED, I wouldn't turn it on indiscriminately.  When the PARAMETERIZATION FORCED database option is on, all queries are parameterized, including complex ones.  This is good in that compilation costs are avoided due to cache hits.  The bad news is that a single plan might not be appropriate for all possible values of a given query.  Worse overall performance will result when higher execution costs (due to sub-optimal plans) exceed compilation savings so you should understand the query mix before considering the option.

In contrast, SQL Server parameterizes only relatively simple "no brainer" queries in the default PARAMETERIZATION SIMPLE mode.  This behavior promotes reuse of plans for queries that will yield the same plan anyway regardless of the literal values in the query.  Complex queries are not parameterized automatically so that the optimizer can generate the optimal plan for the values of the current query in the event of a cache miss.  The downside with simple parameterization, as Adam and I observed, is that complex queries not already in cache will incur costly compilation costs that are a CPU hog in a high-volume OLTP workload.

There is also middle ground between PARAMETERIZATION SIMPLE and PARAMETERIZATION FORCED.  One can use plans guides with PARAMETERIZATION SIMPLE to avoid compilation for selected queries while other complex queries are compiled as normal.  In my case, a plan guide may have been a better option because the culprit was a single query rather than many different unpredictable ones.

In my opinion, the best solution is to use stored procedures and/or parameterized queries in the first place.  These methods provide the performance benefits of PARAMETERIZATION FORCED and add other security and application development benefits.  Unfortunately, third-party vendors are notorious for not following parameterization Best Practices so DBAs need to keep PARAMETERIZATION FORCED and plan guides in their tool belt.

 

Restore Database Stored Procedure

A user in the SQL Server public newsgroups asked about how to restore a database with many files and rename during the process:

I am restoring a database onto another server with different drive
sizes and mappings.
The thing is, I have over 100 catalogs to restore. I don't want to
have to define each catalog name and its new location Like below:

RESTORE DATABASE Northwinds
FROM DISK = 'C:\db.bak'
WITH MOVE 'Catalog1' TO 'D:\Catalog1'
WITH MOVE 'Catalog2' TO 'D:\Catalog2
WITH MOVE 'Catalog3' TO 'D:\Catalog3'
WITH MOVE 'Catalog4' TO 'D:\Catalog4
WITH MOVE 'Catalog5' TO 'D:\Catalog5'
WITH MOVE 'Catalog6' TO 'D:\Catalog6'
...WITH MOVE 'Catalog100' TO 'D:\Catalog100'

Is it possible to restore the catalgos using a wilcard as such?

RESTORE DATABASE Northwinds
FROM DISK = 'C:\db.bak'
WITH MOVE 'Catalog%' TO 'D:\Catalog%'

 

This reminded me of a stored procedure I wrote several years ago for SQL Server 2000 that would be perfect for such a task.  The proc generates and optionally executes the necessary RESTORE and ALTER commands to make quick work of what is otherwise a long and tedious process if you have many files and databases.  I updated my old proc for SQL Server 2008 and thought I'd share it here.    Below is the proc with documentation and samples in the comments.  I hope you find this useful.

IF OBJECT_ID(N'tempdb..#RestoreDatabase_SQL2008') IS NOT NULL

      DROP PROCEDURE #RestoreDatabase_SQL2008

GO

 

CREATE PROCEDURE #RestoreDatabase_SQL2008

      @BackupFile nvarchar(260),

      @NewDatabaseName sysname = NULL,

      @FileNumber int = 1,

      @DataFolder nvarchar(260) = NULL,

      @LogFolder nvarchar(260) = NULL,

      @ExecuteRestoreImmediately char(1) = 'N',

      @ChangePhysicalFileNames char(1) = 'Y',

      @ChangeLogicalNames char(1) = 'Y',

      @DatabaseOwner sysname = NULL,

      @AdditionalOptions nvarchar(500) = NULL

AS

 

/*

 

This procedure will generate and optionally execute a RESTORE DATABASE

script from the specified disk database backup file.

 

Parameters:

 

      @BackupFile: Required. Specifies fully-qualified path to the disk

            backup file. For remote (network) files, UNC path should

            be specified.  The SQL Server service account will need

            permissions to the file.

 

      @NewDatabaseName: Optional. Specifies the target database name

            for the restore.  If not specified, the database is

            restored using the original database name.

 

      @FileNumber: Optional. Specifies the file number of the desired

            backup set. This is needed only when when the backup file

            contains multiple backup sets. If not specified, a

            default of 1 is used.

 

      @DataFolder: Optional. Specifies the folder for all database data

            files. If not specified, data files are restored using the

            original file names and locations.

 

      @LogFolder: Optional. Specifies the folder for all database log

            files. If not specified, log files are restored to the

            original log file locations.

 

      @ExecuteRestoreImmediately: Optional. Specifies whether or not to

            execute the restore. When, 'Y' is specified, then restore is

            executed immediately.  When 'Y' is specified, the restore script

            is printed but not executed. If not specified, a default of 'N'

            is used.

           

      @ChangePhysicalFileNames: Optional. Indicates that physical file

            names are to be renamed during the restore to match the

            new database name. When 'Y' is specified, the leftmost

            part of the original file name matching the original

            database name is replaced with the new database name. The

            file name is not changed when 'N' is specified or if the

            leftmost part of the file name doesn't match the original

            database name. If not specified, a default of 'Y' is used.

 

      @ChangeLogicalNames: Optional. Indicates that logical file names

            are to be renamed following the restore to match the new

            database name. When 'Y' is specified, the leftmost part

            of the original file name matching the original database

            name is replaced with the new database name. The file name

            is not changed when 'N' is specified or if the leftmost

            part of the file name doesn't match the original database

            name. If not specified, a default of 'Y' is used.

           

      @DatabaseOwner: Optional. Specifies the new database owner

            (authorization) of the restored database.  If not specified, the

            database will be owned by the accunt used to restore the database.

           

      @AdditionalOptions:  Optional.  Specifies options to be added the the

            RESTORE statement WITH clause (e.g. STATS=5, REPLACE).  If not

            specified, only the FILE and MOVE are included.

 

Sample usages:

 

      --restore database with same name and file locations

      EXEC #RestoreDatabase_SQL2008

            @BackupFile = N'C:\Backups\Foo.bak',

            @AdditionalOptions=N'STATS=5, REPLACE';

           

      Results:

      --Backup source: ServerName=MYSERVER, DatabaseName=Foo, BackupFinishDate=2009-06-13 11:20:52.000

      RESTORE DATABASE [MyDatabase]

            FROM DISK=N'C:\Backups\Foo.bak'

            WITH

                  FILE=1, STATS=5, REPLACE

 

      --restore database with new name and change logical and physical names

      EXEC #RestoreDatabase_SQL2008

            @BackupFile = N'C:\Backups\Foo.bak',

            @NewDatabaseName = 'Foo2';

           

      Results:

      --Backup source: ServerName=MYSERVER, DatabaseName=Foo, BackupFinishDate=2009-06-13 11:20:52.000

      RESTORE DATABASE [Foo2]

            FROM DISK=N'C:\Backups\Foo.bak'

            WITH

                  FILE=1,

                        MOVE 'Foo' TO 'C:\DataFolder\Foo2.mdf',

                        MOVE 'Foo_log' TO 'D:\LogFolder\Foo2_log.LDF'

      ALTER DATABASE [Foo2]

                        MODIFY FILE (NAME='Foo', NEWNAME='Foo2');

      ALTER DATABASE [Foo2]

                        MODIFY FILE (NAME='Foo_log', NEWNAME='Foo2_log');

                       

      --restore database to different file folders and change owner after restore:

      EXEC #RestoreDatabase_SQL2008

            @BackupFile = N'C:\Backups\Foo.bak',

            @DataFolder = N'E:\DataFiles',

            @LogFolder = N'F:\LogFiles',

            @DatabaseOwner = 'sa',

            @AdditionalOptions=N'STATS=5;

           

      Results:

      --Backup source: ServerName=MYSERVER, DatabaseName=Foo, BackupFinishDate=2009-06-13 11:20:52.000

      RESTORE DATABASE [Foo]

            FROM DISK=N'C:\Backups\Foo.bak'

            WITH

                  FILE=1,

                        MOVE 'Foo' TO 'E:\DataFiles\Foo.mdf',

                        MOVE 'Foo_log' TO 'F:\LogFiles\Foo_log.LDF'

      ALTER AUTHORIZATION ON DATABASE::[Foo] TO [sa]

*/

 

SET NOCOUNT ON;

 

DECLARE @LogicalName nvarchar(128),

      @PhysicalName nvarchar(260),

      @PhysicalFolderName nvarchar(260),

      @PhysicalFileName nvarchar(260),

      @NewPhysicalName nvarchar(260),

      @NewLogicalName nvarchar(128),

      @OldDatabaseName nvarchar(128),

      @RestoreStatement nvarchar(MAX),

      @Command nvarchar(MAX),

      @ReturnCode int,

      @FileType char(1),

      @ServerName nvarchar(128),

      @BackupFinishDate datetime,

      @Message nvarchar(4000),

      @ChangeLogicalNamesSql nvarchar(MAX),

      @AlterAuthorizationSql nvarchar(MAX),

      @Error int;

 

DECLARE @BackupHeader TABLE

      (

      BackupName nvarchar(128) NULL,

      BackupDescription  nvarchar(255) NULL,

      BackupType smallint NULL,

      ExpirationDate datetime NULL,

      Compressed tinyint NULL,

      Position smallint NULL,

      DeviceType tinyint NULL,

      UserName nvarchar(128) NULL,

      ServerName nvarchar(128) NULL,

      DatabaseName nvarchar(128) NULL,

      DatabaseVersion int NULL,

      DatabaseCreationDate  datetime NULL,

      BackupSize numeric(20,0) NULL,

      FirstLSN numeric(25,0) NULL,

      LastLSN numeric(25,0) NULL,

      CheckpointLSN  numeric(25,0) NULL,

      DatabaseBackupLSN  numeric(25,0) NULL,

      BackupStartDate  datetime NULL,

      BackupFinishDate  datetime NULL,

      SortOrder smallint NULL,

      CodePage smallint NULL,

      UnicodeLocaleId int NULL,

      UnicodeComparisonStyle int NULL,

      CompatibilityLevel  tinyint NULL,

      SoftwareVendorId int NULL,

      SoftwareVersionMajor int NULL,

      SoftwareVersionMinor int NULL,

      SoftwareVersionBuild int NULL,

      MachineName nvarchar(128) NULL,

      Flags int NULL,

      BindingID uniqueidentifier NULL,

      RecoveryForkID uniqueidentifier NULL,

      Collation nvarchar(128) NULL,

      FamilyGUID uniqueidentifier NULL,

      HasBulkLoggedData bit NULL,

      IsSnapshot bit NULL,

      IsReadOnly bit NULL,

      IsSingleUser bit NULL,

      HasBackupChecksums bit NULL,

      IsDamaged bit NULL,

      BeginsLogChain bit NULL,

      HasIncompleteMetaData bit NULL,

      IsForceOffline bit NULL,

      IsCopyOnly bit NULL,

      FirstRecoveryForkID uniqueidentifier NULL,

      ForkPointLSN decimal(25, 0) NULL,

      RecoveryModel nvarchar(60) NULL,

      DifferentialBaseLSN decimal(25, 0) NULL,

      DifferentialBaseGUID uniqueidentifier NULL,

      BackupTypeDescription  nvarchar(60) NULL,

      BackupSetGUID uniqueidentifier NULL,

      CompressedBackupSize binary(8) NULL

);

 

DECLARE @FileList TABLE

      (

      LogicalName nvarchar(128) NOT NULL,

      PhysicalName nvarchar(260) NOT NULL,

      Type char(1) NOT NULL,

      FileGroupName nvarchar(120) NULL,

      Size numeric(20, 0) NOT NULL,

      MaxSize numeric(20, 0) NOT NULL,

      FileID bigint NULL,

      CreateLSN numeric(25,0) NULL,

      DropLSN numeric(25,0) NULL,

      UniqueID uniqueidentifier NULL,

      ReadOnlyLSN numeric(25,0) NULL ,

      ReadWriteLSN numeric(25,0) NULL,

      BackupSizeInBytes bigint NULL,

      SourceBlockSize int NULL,

      FileGroupID int NULL,

      LogGroupGUID uniqueidentifier NULL,

      DifferentialBaseLSN numeric(25,0)NULL,

      DifferentialBaseGUID uniqueidentifier NULL,

      IsReadOnly bit NULL,

      IsPresent bit NULL,

      TDEThumbprint varbinary(32) NULL

 );

 

SET @Error = 0;

 

--add trailing backslash to folder names if not already specified

IF LEFT(REVERSE(@DataFolder), 1) <> '\' SET @DataFolder = @DataFolder + '\';

IF LEFT(REVERSE(@LogFolder), 1) <> '\' SET @LogFolder = @LogFolder + '\';

 

-- get backup header info and display

SET @RestoreStatement = N'RESTORE HEADERONLY

      FROM DISK=N''' + @BackupFile + ''' WITH FILE=' + CAST(@FileNumber as nvarchar(10));

INSERT INTO @BackupHeader

      EXEC('RESTORE HEADERONLY FROM DISK=N''' + @BackupFile + ''' WITH FILE = 1');

SET @Error = @@ERROR;

IF @Error <> 0 GOTO Done;

IF NOT EXISTS(SELECT * FROM @BackupHeader) GOTO Done;

SELECT

      @OldDatabaseName = DatabaseName,

      @ServerName = ServerName,

      @BackupFinishDate = BackupFinishDate

FROM @BackupHeader;

IF @NewDatabaseName IS NULL SET @NewDatabaseName = @OldDatabaseName;

SET @Message = N'--Backup source: ServerName=%s, DatabaseName=%s, BackupFinishDate=' +

      CONVERT(nvarchar(23), @BackupFinishDate, 121);

RAISERROR(@Message, 0, 1, @ServerName, @OldDatabaseName) WITH NOWAIT;

 

-- get filelist info

SET @RestoreStatement = N'RESTORE FILELISTONLY

      FROM DISK=N''' + @BackupFile + ''' WITH FILE=' + CAST(@FileNumber as nvarchar(10));

INSERT INTO @FileList

      EXEC(@RestoreStatement);

SET @Error = @@ERROR;

IF @Error <> 0 GOTO Done;

IF NOT EXISTS(SELECT * FROM @FileList) GOTO Done;

 

-- generate RESTORE DATABASE statement and ALTER DATABASE statements

SET @ChangeLogicalNamesSql = '';

SET @RestoreStatement =

      N'RESTORE DATABASE ' +

      QUOTENAME(@NewDatabaseName) +

      N'

      FROM DISK=N''' +

      @BackupFile + '''' +

      N'

      WITH

            FILE=' +

      CAST(@FileNumber as nvarchar(10))

DECLARE FileList CURSOR LOCAL STATIC READ_ONLY FOR

      SELECT

            Type AS FileTyoe,

            LogicalName,

            --extract folder name from full path

            LEFT(PhysicalName,

                  LEN(LTRIM(RTRIM(PhysicalName))) -

                  CHARINDEX('\',

                  REVERSE(LTRIM(RTRIM(PhysicalName)))) + 1)

                  AS PhysicalFolderName,

            --extract file name from full path

            LTRIM(RTRIM(RIGHT(PhysicalName,

                  CHARINDEX('\',

                  REVERSE(PhysicalName)) - 1))) AS PhysicalFileName

FROM @FileList;

 

OPEN FileList;

 

WHILE 1 = 1

BEGIN

      FETCH NEXT FROM FileList INTO

            @FileType, @LogicalName, @PhysicalFolderName, @PhysicalFileName;

      IF @@FETCH_STATUS = -1 BREAK;

 

      -- build new physical name

      SET @NewPhysicalName =

            CASE @FileType

                  WHEN 'D' THEN

                        COALESCE(@DataFolder, @PhysicalFolderName) +

                        CASE

                              WHEN UPPER(@ChangePhysicalFileNames) IN ('Y', '1') AND

                                    LEFT(@PhysicalFileName, LEN(@OldDatabaseName)) = @OldDatabaseName

                              THEN

                                    @NewDatabaseName + RIGHT(@PhysicalFileName, LEN(@PhysicalFileName) - LEN(@OldDatabaseName))

                              ELSE

                                    @PhysicalFileName

                        END

                  WHEN 'L' THEN

                        COALESCE(@LogFolder, @PhysicalFolderName) +

                        CASE

                              WHEN UPPER(@ChangePhysicalFileNames) IN ('Y', '1') AND

                                    LEFT(@PhysicalFileName, LEN(@OldDatabaseName)) = @OldDatabaseName

                              THEN

                                    @NewDatabaseName + RIGHT(@PhysicalFileName, LEN(@PhysicalFileName) - LEN(@OldDatabaseName))

                              ELSE

                                    @PhysicalFileName

                        END

            END;

 

      -- build new logical name

      SET @NewLogicalName =

            CASE

                  WHEN UPPER(@ChangeLogicalNames) IN ('Y', '1') AND

                        LEFT(@LogicalName, LEN(@OldDatabaseName)) = @OldDatabaseName

                        THEN

                              @NewDatabaseName + RIGHT(@LogicalName, LEN(@LogicalName) - LEN(@OldDatabaseName))

                        ELSE

                              @LogicalName

            END;

           

      -- generate ALTER DATABASE...MODIFY FILE statement if logical file name is different

      IF @NewLogicalName <> @LogicalName

            SET @ChangeLogicalNamesSql = @ChangeLogicalNamesSql + N'ALTER DATABASE ' + QUOTENAME(@NewDatabaseName) + N'

                  MODIFY FILE (NAME=''' + @LogicalName + N''', NEWNAME=''' + @NewLogicalName + N''');

'

 

      -- add MOVE option as needed if folder and/or file names are changed

      IF @PhysicalFolderName + @PhysicalFileName <> @NewPhysicalName

      BEGIN

            SET @RestoreStatement = @RestoreStatement +

                  N',

                  MOVE ''' +

                  @LogicalName +

                  N''' TO ''' +

                  @NewPhysicalName +

                  N'''';

      END;

 

END;

CLOSE FileList;

DEALLOCATE FileList;

 

IF @AdditionalOptions IS NOT NULL

      SET @RestoreStatement =

            @RestoreStatement + N', ' + @AdditionalOptions

           

IF @DatabaseOwner IS NOT NULL

      SET @AlterAuthorizationSql = N'ALTER AUTHORIZATION ON DATABASE::' +

            QUOTENAME(@NewDatabaseName) + N' TO ' + QUOTENAME(@DatabaseOwner)

ELSE

      SET @AlterAuthorizationSql = N''

--execute RESTORE statement

IF UPPER(@ExecuteRestoreImmediately) IN ('Y', '1')

BEGIN

 

      RAISERROR(N'Executing:

%s', 0, 1, @RestoreStatement) WITH NOWAIT

      EXEC (@RestoreStatement);

      SET @Error = @@ERROR;

      IF @Error <> 0 GOTO Done;

 

      --execute ALTER DATABASE statement(s)

      IF @ChangeLogicalNamesSql <> ''

      BEGIN

            RAISERROR(N'Executing:

%s', 0, 1, @ChangeLogicalNamesSql) WITH NOWAIT

            EXEC (@ChangeLogicalNamesSql);

            SET @Error = @@ERROR;

            IF @Error <> 0 GOTO Done;

      END

     

      IF @AlterAuthorizationSql <> ''

      BEGIN

            RAISERROR(N'Executing:

%s', 0, 1, @AlterAuthorizationSql) WITH NOWAIT

            EXEC (@AlterAuthorizationSql);

            SET @Error = @@ERROR;

            IF @Error <> 0 GOTO Done;

      END

     

END

ELSE

BEGIN

      RAISERROR(N'%s', 0, 1, @RestoreStatement) WITH NOWAIT

      IF @ChangeLogicalNamesSql <> ''

      BEGIN

            RAISERROR(N'%s', 0, 1, @ChangeLogicalNamesSql) WITH NOWAIT;

      END

      IF @AlterAuthorizationSql <> ''

      BEGIN

            RAISERROR(N'%s', 0, 1, @AlterAuthorizationSql) WITH NOWAIT;

      END

END;

 

Done:

 

RETURN @Error;

GO

 

 

Database Mail Configuration

I recently had to setup Database Mail on dozens of SQL Server instances.   Rather than perform this tedious task using the SSMS GUI, I developed a script that saved me a lot of time which I'm sharing here.   

My needs were simple so I only needed a single SMTP account and profile.  I decided to make the profile the default public one so that all msdb users would use this profile unless a different sp_send_dbmail @profile value was explicitly specified.  You might want to extend this script if you need other accounts/profiles, such as separate ones for administrative alerts or user reports.

Setup Script

Below is the template script I used for my task.  The sysmail_add_account_sp @username and @password parameters might be required depending on your SMTP server authentication and you will of course need to customize the mail server name and addresses for your environment.

-- Enable Database Mail for this instance

EXECUTE sp_configure 'show advanced', 1;

RECONFIGURE;

EXECUTE sp_configure 'Database Mail XPs',1;

RECONFIGURE;

GO

 

-- Create a Database Mail account

EXECUTE msdb.dbo.sysmail_add_account_sp

    @account_name = 'Primary Account',

    @description = 'Account used by all mail profiles.',

    @email_address = 'myaddress@mydomain.com',

    @replyto_address = 'myaddress@mydomain.com',

    @display_name = 'Database Mail',

    @mailserver_name = 'mail.mydomain.com';

 

-- Create a Database Mail profile

EXECUTE msdb.dbo.sysmail_add_profile_sp

    @profile_name = 'Default Public Profile',

    @description = 'Default public profile for all users';

 

-- Add the account to the profile

EXECUTE msdb.dbo.sysmail_add_profileaccount_sp

    @profile_name = 'Default Public Profile',

    @account_name = 'Primary Account',

    @sequence_number = 1;

 

-- Grant access to the profile to all msdb database users

EXECUTE msdb.dbo.sysmail_add_principalprofile_sp

    @profile_name = 'Default Public Profile',

    @principal_name = 'public',

    @is_default = 1;

GO

 

--send a test email

EXECUTE msdb.dbo.sp_send_dbmail

    @subject = 'Test Database Mail Message',

    @recipients = 'testaddress@mydomain.com',

    @query = 'SELECT @@SERVERNAME';

GO

Not Before Service Pack 1

In case you haven't yet heard, Microsoft SQL Server 2008 service pack 1 was released on April 7.  This milestone is especially significant for those of you who could not previously deploy the latest SQL Server release because your organization has a "not before the first service pack" policy.  I want to go on record as one who believes that such a policy is flawed and has needlessly delayed many organizations from using the new SQL Server 2008 features.

There is nothing magical about the first service pack compared to the initial RTM release with regards to production readiness.  SQL Server releases nowadays are scheduled based quality rather than just hitting a date.  Buggy features will be dropped from a release rather than included and in need of a service pack.  I'm not saying that every SQL Server release is flawless but the number of serious bugs (e.g. corruption or wrong results) are few, thanks to internal testing by Microsoft as well as those in the community that kick the tires with the pre-release CTP bits.

It's understandable that those who are risk-adverse might wait until after the first service pack with the belief that other adopters may have smoothed out the bumps in the road a bit.  I can see how postponing installation in this way might mitigate some of the risk but SP1 is a completely arbitrary milestone that is a hold-over from before SQL 7 was released over a decade ago.  I think a better approach is to adopt new releases based on quality as determined in one's own environment.  Whether the target is a new SQL Server installation or an upgrade of an existing instance, one still needs to perform testing before installing any new version, service pack or patch in production.  It is those test results that should determine production readiness, not the results of SELECT SERVERPROPERTY('ProductLevel').