Dan Guzman Blog

Collation Hell (Part 3)

In this final post of my Collation Hell series, I'll discuss techniques to change a SQL Server instance collation along with the collation of all databases and columns.  The objective is to ensure the standard collation is used throughout the entire SQL Server instance.  See part 1 and part 2 of this series for more information on selecting a standard collation and planning such a collation change.

Be aware that a complete collation change is not unlike that of a major version upgrade, except tools to facilitate the change are limitted.  You'll need to build new system databases, change user databases and change every character column to conform to the new collation.  These collation changes can be done using either a side-by-side migration technique or performed in-place.

Changing the Instance Collation

The SQL Server setup REBUILDDATABASE option (see Books Online) is used to create new system databases for an existing instance with the desired collation.  One advantage of using REBUILDDATABASE over a complete reinstall is that post-RTM service packs and patches don't need to be reapplied afterward.  However, all server level objects like logins, linked servers, jobs, etc. need to be recreated after the rebuild so you'll need to script those out beforehand.  User databases and columns will need to be changed separately, which I'll discuss in more detail later.

You can also perform a fresh SQL Server install on another instance for a side-by-side migration.  One of the advantages of this side-by-side migration technique is that fallback is fast and relatively easy.  The side-by-side migration method is attractive if you plan a server hardware and/or SQL version upgrade anyway.  However, like the REBUILDDATABASE, you will need to create server-level objects after the install. 

Changing User Database Collation

Before I get into the details of a database collation change, please vote on Connect feedback item Make it easy to change collation on a database.  Until such a feature us available, we will endure the pain of performing this task manually.

Assuming you have performed due diligence and remediation beforehand (see my collation change planning article), changing the database collation in-place is relatively easy.  A simple ALTER DATABASE will change the collation of all user database system objects as well as the database default collation:

ALTER DATABASE Foo

COLLATE Latin1_General_CI_AS;

But note that this database collation change does not actually change the collation of existing user table columns.  Columns that do not match the database collation must be changed individually to conform, which is why a mass collation change is such a PITA.  You might choose to rebuild the database using a side-by-side method so that both the database and column collations can be changed during the rebuild process.  I generally recommend such a side--by-side method unless you are constrained by storage space.

Changing Column Collation Using ATLER TABLE...ALTER COLUMN

The syntax for changing a column collation is simple; just execute ALTER TABLE...ALTER COLUMN using the same column definition except for new column collation:

ALTER TABLE dbo.Foo ALTER COLUMN

      Bar varchar(50) COLLATE Latin1_General_CI_AS NOT NULL;

The above DDL method appears simple at first glance but there are many caveats that make this method problematic, especially when it must be repeated for many tables, large databases and/or a code page change is involved.  ALTER TABLE...ALTER COLUMN may be acceptable for a isolated change but not necessarily for a mass one.  The major issues are:

·         Each column must be changed individually

You'll need a separate ALTER COLUMN statement for each character column in the database.  A T-SQL script that generates the needed DDL using the catalog views is a must.  See Louis Davidson's Change table collations en masse article for an example and be aware that text columns are problematic.

·         Column references must be dropped

The altered column cannot be referenced by a constraint, index, statistic, computed column or schemabound object.  This means that all of these references must be dropped before the column is altered and recreated afterward.

·         Data are updated with a code page change

ALTER TABLE...ALTER COLUMN is a always a fast metadata-only change with a Unicode column.  The operation is also a metadata-only change for a non-Unicode column, but only if the old and new collations have the same code page/character set. 

When the old and new collations have a different code page/character set, then every row must be updated when a non-Unicode column is changed.  The performance ramifications of such an update are huge, especially with large tables.  A full table scan is required for each ALTER statement and every row in the table will be updated.  Also, since SQL Server internally drops the old column and adds a new one, the internal row size increases considerably.  Be aware that space requirements for modified non-Unicode columns will more than double until the clustered index is (re)built.  To reclaim the space of a heap, you'll need to create and drop a clustered index.  Keep in mind that the ALTER operation is fully logged regardless of the database recovery model so you need to plan log space requirements accordingly.

Because of these considerations, I do not recommend using ALTER TABLE...ALTER COLUMN for a mass collation change, especially when non-Unicode columns are involved and the code page/character set of the collations are different.  Instead, migrate data to a new table with columns of the desired collation.

Changing Column Collation Using a New Table

If you cannot perform a side-by-side migration of the entire database using a side-by-side method due to storage constraints, an alternative to ALTER TABLE...ALTER COLUMN is to create a new table with the desired collation and then copy data from the original table.  I also recommend this method over ALTER TABLE...ALTER COLOMN when migrating to a different code page/character set for the reasons I previously mentioned.

1.       Change the database recovery model to SIMPLE to minimize log space requirements

2.       Drop all constraints, except clustered primary key and clustered unique constraints

3.       Drop all non-clustered indexes to free up disk space for the migration

4.       For each table:

o   Create a new table exactly like the original, except with a different name and new collation for all character columns

o   Create the clustered index and check constraints

o   Load data

·         Use INSERT...SELECT to load the new table.  Be sure to specify a TABLOCKX hint on the INSERT so that the operation is minimally logged.  If the table has an identity column. be sure to SET IDENTITY_INSERT...ON to retain the existing identity values.

o   Drop the old table after successful copy and rename new table to old name

5.       Create non-clustered indexes, constraints, triggers, object permissions, etc.

Summary

I cannot overstate the importance of choosing the right collation during the initial install since it is difficult to change after the fact.  Unfortunately, we often inherit instances and databases of varying collations and must evaluate the effort of the collation change against the benefits of a consistent collation.  If you are considering a collation change, be sure to test beforehand to avoid surprises during and after the migration and have a solid fallback plan.