|
|
Tuesday, July 06, 2010
Today, let's examine encoding with SQL Server and XML datatype.
DECLARE @Inf XML
SET @Inf = '<?xml version="1.0" encoding="utf-16"?>
<root>
<names>
<name>test</name>
</names>
<names>
<name>test1</name>
</names>
</root>
'
SELECT x.value('name[1]', 'VARCHAR(10)') AS Name
FROM @Inf.nodes('/root/names') AS t(x)
If you try to run the code above, you will get an error message like this
Msg 9402, Level 16, State 1, Line 3
XML parsing: line 1, character 39, unable to switch the encoding
Why is that? If you change the encoding to UTF-8, the code works.
The solution is to know that UTF-16 works like UNICODE, and how do we denote UNICODE strings in SQL Server? Yes, by prefixing the string with N.
So this code works with UTF-16 encoding and you can happily continue to work.
DECLARE @Inf XML
SET @Inf = N'<?xml version="1.0" encoding="utf-16"?>
<root>
<names>
<name>test</name>
</names>
<names>
<name>test1</name>
</names>
</root>
'
SELECT x.value('name[1]', 'VARCHAR(10)') AS Name
FROM @Inf.nodes('/root/names') AS t(x)
Sunday, July 04, 2010
This question has been asked over and over again, and instead of having to redirect to any of my previous answers, I will cover the solution here in my blog.
Consider this sample data
DECLARE @Sample TABLE
(
ID INT,
Data VARCHAR(100)
)
INSERT @Sample
VALUES (1, 'Peso & Performance SQL'),
(1, 'MVP'),
(2, 'Need help <? /> -- '),
(2, 'With XML string concatenation ?')
The "trick" is to use the TYPE directive (to deal with entitization), and then use ".value" function to get the correct value out.
So here is the final query.
SELECT i.ID,
STUFF(f.Content.value('.', 'VARCHAR(MAX)'), 1, 1, '')
FROM (
SELECT ID
FROM @Sample
GROUP BY ID
) AS i
CROSS APPLY (
SELECT ', ' + w.Data
FROM @Sample AS w
WHERE w.ID = i.ID
FOR XML PATH(''),
TYPE
) AS f(Content)
And to deal with characters having ascii values less than 32 (space), you can use this
;WITH cteSource(ID, Content)
AS (
SELECT i.ID,
f.Content.value('.', 'NVARCHAR(MAX)')
FROM (
SELECT ID
FROM @Sample
GROUP BY ID
) AS i
CROSS APPLY (
SELECT CAST(', ' + w.Data AS VARBINARY(MAX))
FROM @Sample AS w
WHERE w.ID = i.ID
FOR XML PATH(''),
TYPE
) AS f(Content)
)
SELECT ID,
STUFF(CAST(Content AS NVARCHAR(MAX)), 1, 2, '') AS Content
FROM (
SELECT ID,
CAST(N'' AS XML).value('xs:base64Binary(sql:column("Content"))', 'VARBINARY(MAX)') AS Content
FROM cteSource
) AS d
Friday, July 02, 2010
I got an email from Mr Celko and he correctly stated that my previous solution was not truly working with sets, so he posted me some solutions and proper sample data.
With this information at my hand, I started to investigate what really is needed to get this Relational Division to work properly with two sets; Dividend and Divisor.
Some of you know me well, and know I am not satisfied with just solving the problem. There have to be some tweaks, and I did that too with this solution. Not only is it only touching the Dividend table once and Divisor table once, you can also set if you want a division with no remainder (which means all records in Divisor should match and not a single record more), or allow a division with remainder (which means all the records should match and maybe more records).
Great? Just set 1 for "No remainder" and 0 for "Allow remainder".
Simple as that. So why does it work? Remember your old algebra? "Divide is the same thing as multiply with the inverse number..."
Now for the sample data (courtesy of Mr Celko)
CREATE TABLE dbo.Dividend
(
group_id INTEGER NOT NULL,
item_name VARCHAR(10) NOT NULL,
PRIMARY KEY (
group_id,
item_name
)
)
INSERT INTO dbo.Dividend
(
group_id,
item_name
)
VALUES (1, 'one'),
(1, 'two'),
(1, 'three'),
(1, 'four'),
(2, 'one'),
(2, 'two'),
(2, 'three'),
(3, 'one'),
(3, 'two')
CREATE TABLE dbo.Divisor
(
item_name VARCHAR(10) NOT NULL PRIMARY KEY
)
INSERT INTO dbo.Divisor
(
item_name
)
VALUES ('one'),
('two'),
('three')
Now for the 4 solutions posted by Mr Celko
-- Celko 1
SELECT D1.group_id
FROM Dividend AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY D1.group_id
HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
AND COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Dividend AS D2 WHERE D2.group_id = D1.group_id)
-- Celko 2
SELECT D1.group_id
FROM Dividend AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
AND NOT EXISTS (
SELECT *
FROM Dividend AS D2
WHERE D2.group_id = D1.group_id
AND D2.item_name NOT IN (SELECT item_name FROM Divisor)
)
GROUP BY D1.group_id
HAVING COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
-- Celko 3
SELECT D1.group_id
FROM (
SELECT group_id,
item_name,
COUNT(*) OVER (PARTITION BY group_id) AS cnt
FROM Dividend
) AS D1
WHERE D1.item_name IN (SELECT item_name FROM Divisor)
AND cnt = (SELECT COUNT(*) FROM Divisor)
GROUP BY D1.group_id
HAVING COUNT(D1.cnt) = (SELECT COUNT(*) FROM Divisor)
--Celko 4
;WITH Divisor2
AS (
SELECT group_id,
MIN(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS single,
SUM(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS full_basket
FROM Dividend
)
SELECT D.group_id
FROM Dividend AS D,
Divisor2
WHERE D.group_id = Divisor2.group_id
AND Divisor2.single = 1
AND Divisor2.full_basket = (SELECT COUNT(*) FROM Divisor)
GROUP BY D.group_id
You can copy and paste the code to a query window and run them. Investigate the execution plan and compare the 4 of them.
And now to my solution.
-- Peso 1
SELECT group_id
FROM (
SELECT t.group_id,
SUM(CASE WHEN t.item_name = n.item_name THEN 1 ELSE 0 END) AS cnt,
COUNT(*) AS Items
FROM dbo.Dividend AS t
CROSS JOIN dbo.Divisor AS n
GROUP BY t.group_id,
t.item_name
) AS d
GROUP BY group_id
HAVING SUM(cnt) = MIN(Items)
AND MIN(cnt) >= 1 -- 1 means no remainder, 0 means remainder
After some challenging with MVP Adam Machanic, here is another version
-- Peso v2
SELECT t.group_id
FROM (
SELECT group_id,
COUNT(*) AS cnt
FROM dbo.Dividend
GROUP BY group_id
) AS kc
INNER JOIN (
SELECT COUNT(*) AS cnt
FROM dbo.Divisor
) AS nc ON nc.cnt = kc.cnt
INNER JOIN dbo.Dividend AS t ON t.group_id = kc.group_id
INNER JOIN dbo.Divisor AS n ON n.item_name = t.item_name
GROUP BY t.group_id
HAVING COUNT(*) = MIN(nc.cnt)
Here is an algorithm (exact division) which is really fast, but not 100% accurate due to the implementation of CHECKSUM (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832).
-- Peso 3
SELECT group_id
FROM (
SELECT group_id,
CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
FROM dbo.Dividend
GROUP BY group_id
) AS t
INNER JOIN (
SELECT CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
FROM dbo.Divisor
) AS n ON n.ca1 = t.ca1
AND n.ca2 = t.ca2
Now copy my solutions and compare them to the other 4.
-- Celko 1
Table 'Dividend'. Scan count 3, logical reads 6.
Table 'Divisor'. Scan count 2, logical reads 20.
-- Celko 2
Table 'Divisor'. Scan count 3, logical reads 32.
Table 'Dividend'. Scan count 4, logical reads 8.
-- Celko 3
Table 'Divisor'. Scan count 3, logical reads 10.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Dividend'. Scan count 1, logical reads 2.
-- Celko 4
Table 'Dividend'. Scan count 2, logical reads 9.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Divisor'. Scan count 1, logical reads 38.
-- Peso 1
Table 'Divisor'. Scan count 1, logical reads 19.
Table 'Dividend'. Scan count 1, logical reads 2.
-- Peso 2
Table 'Divisor'. Scan count 1, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.
-- Peso 3
Table 'Divisor'. Scan count 1, logical reads 2.
Table 'Dividend'. Scan count 1, logical reads 2.
So it seems my solution is cleaner and faster than the previous existing. But the best thing is yet hidden. My solution cares for multi-column division (just expand the CASE and GROUP BY clauses) whereas the previous 4 do not. Well, not easily anyway.
It will involve some replacing for IN with EXISTS, and some string concatenation for the DISTINCT clauses.
//Peso
Wednesday, June 30, 2010
I came across an interesting post on Microsft SQL Server forum this afternoon. It was a question about Relational algebra and the poster wanted to have an efficient query to solve hos problem.
The problem could be solved with relational division, but there is no such operator in SQL Server. Maybe there will be same day.
For a fully working solution, see http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx
But for now there is no such operator, so we as developers have to find our own ways.
First prepare and populate some sample data
-- Prepare sample data
DECLARE @Sample TABLE
(
ParentID INT NOT NULL,
Keyword VARCHAR(25) NOT NULL,
UNIQUE (ParentID, Keyword)
)
-- Populate sample data
INSERT @Sample
(
ParentID,
Keyword
)
VALUES (1, 'one'),
(1, 'two'),
(1, 'three'),
(1, 'four'),
(2, 'one'),
(2, 'two'),
(2, 'three'),
(3, 'one'),
(3, 'two')
People had already been active and posted some solutions, of which this common query was present.
SELECT s.ParentID
FROM @Sample AS s
WHERE s.Keyword IN ('one', 'two', 'three')
GROUP BY s.ParentID
HAVING COUNT(DISTINCT s.Keyword) = 3
AND COUNT(DISTINCT s.Keyword) = (SELECT COUNT(*) FROM @Sample AS x WHERE x.ParentID = s.ParentID)
and this type of query
SELECT s.ParentID
FROM @Sample AS s
WHERE s.Keyword IN ('one', 'two', 'three')
AND NOT EXISTS (
SELECT *
FROM @Sample AS x
WHERE x.ParentID = s.ParentID
AND x.Keyword NOT IN ('one', 'two', 'three')
)
GROUP BY s.ParentID
HAVING COUNT(DISTINCT s.Keyword) = 3
And even a XML query!
;WITH AggStr
AS (
SELECT ParentId,
(
SELECT CAST(',' AS VARCHAR(MAX)) + c.Keyword
FROM @Sample AS c
WHERE c.ParentID = p.ParentID
ORDER BY c.Keyword
FOR XML PATH('')
) AS c1
FROM (
SELECT DISTINCT
ParentID
FROM @Sample
) AS p
)
SELECT ParentID
FROM AggStr
WHERE c1 = ',one,three,two'
The good thing is that all three produce the same wanted result but the bad thing is the inefficient execution plans. Then one poster did his homework and read about Mr Celko and translated his algorithm to the current problem, and then the query looked like this
SELECT ParentID
FROM (
SELECT ParentID,
Keyword,
COUNT(*) OVER (PARTITION BY ParentID) AS cnt
FROM @Sample
) AS w
WHERE Keyword IN ('one', 'two', 'three')
AND cnt = 3
GROUP BY ParentID
HAVING COUNT(cnt) = 3
With these queries in mind, I thought about the problem and realized the problem did in fact have a much simpler solution.
The query I came up with is the simplest of them all, and just does one pass of the source table. Yes, only one pass just as the first Celko query for relational division, but without the internal worktable.
This is the query I came up with
-- Peso
SELECT ParentID
FROM @Sample
GROUP BY ParentID
HAVING MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1
AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3
How does the query work? The second aggregation filtering just makes sure all three keywords are present.
But the first aggregation filter? What does it do? To simplify, I just write that it takes care of the modulo part of the relational division. There cannot be a "fractional" part of the relational division, because it means that particular ParentID has more keywords than wanted.
Simple as that.
//Peso
PS. These are the textual execution plans for the four types of queries and then mine.
|--Filter(WHERE:([Expr1003]=CASE WHEN [Expr1007] IS NULL THEN (0) ELSE [Expr1007] END))
|--Nested Loops(Left Outer Join, OUTER REFERENCES:([s].[ParentID]))
|--Filter(WHERE:([Expr1003]=(3)))
| |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0)))
| |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1014]=Count(*)))
| |--Index Scan(OBJECT:(@Sample AS [s]), WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0)))
|--Stream Aggregate(DEFINE:([Expr1015]=Count(*)))
|--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]) ORDERED FORWARD)
|--Filter(WHERE:([Expr1007]=(3)))
|--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1010],0)))
|--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1010]=Count(*)))
|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([s].[ParentID]))
|--Index Scan(OBJECT:(@Sample AS [s]), WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD)
|--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]), WHERE:(@Sample.[Keyword] as [x].[Keyword]<>'one' AND @Sample.[Keyword] as [x].[Keyword]<>'three' AND @Sample.[Keyword] as [x].[Keyword]<>'two') ORDERED FORWARD)
|--Filter(WHERE:([Expr1008]=N',one,three,two'))
|--Nested Loops(Inner Join, OUTER REFERENCES:([ParentID]))
|--Stream Aggregate(GROUP BY:([ParentID]))
| |--Index Scan(OBJECT:(@sample), ORDERED FORWARD)
|--UDX(([Expr1007], [C].[Keyword]))
|--Compute Scalar(DEFINE:([Expr1007]=CONVERT(varchar(max),',',0)+@sample.[Keyword] as [C].[Keyword]))
|--Index Seek(OBJECT:(@sample AS [C]), SEEK:([C].[ParentID]=[ParentID]) ORDERED FORWARD)
|--Filter(WHERE:([Expr1005]=(3)))
|--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1010],0)))
|--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1010]=Count(*)))
|--Filter(WHERE:(([Keyword]='one' OR [Keyword]='three' OR [Keyword]='two') AND [Expr1004]=(3)))
|--Nested Loops(Inner Join)
|--Table Spool
| |--Segment
| |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)
|--Nested Loops(Inner Join, WHERE:((1)))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1009],0)))
| |--Stream Aggregate(DEFINE:([Expr1009]=Count(*)))
| |--Table Spool
|--Table Spool
|--Filter(WHERE:([Expr1004]=(1) AND [Expr1005]=(3)))
|--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1004]=MIN([Expr1006]), [Expr1005]=SUM([Expr1006])))
|--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Keyword]='three' OR [Keyword]='two' OR [Keyword]='one' THEN (1) ELSE (0) END))
|--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)
This time I will show you an algorithm to do the dreaded bin-packaging using recursion and XML.
First, create some sample data like this
-- Prepare sample data
DECLARE @Sample TABLE
(
RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Expense SMALLMONEY NOT NULL
)
-- Populate sample data
INSERT @Sample
(
Expense
)
VALUES (12.51),
(45.63),
(66.35),
(92.66),
(65.46),
(54.01),
(32.23),
(27.16),
(78.92),
(14.58)
Next, we need to create a variable to hold the user's wanted total sum.
-- Prepare user supplied parameter
DECLARE @WantedSUM SMALLMONEY = 111.09
And we also need to create a temporary staging table to hold the valid combinations
-- Prepare temporary staging table
DECLARE @Temp TABLE
(
CombID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
TrackPath XML NOT NULL
)
Now we only have to do the calculations!
Here I am using a special trick to get the unique combination, since the path of records 1>2>3 is the same as 1>3>2, 2>1>3, 2>3>1, 3>1>2 and 3>2>1. See explanation between combination and permutation at Wikipedia here.
To keep track of which records I already have used in the total sum, I simply remove the record id (RowID) from the Hits list.
And, to give the correct answer at the end, I build a XML string with visited RowID's building up the correct sum.
-- Calculate all possible permutations using recursion
;WITH ctePack(Total, Hits, TrackPath)
AS (
SELECT s.Expense AS Total,
(SELECT '#' + CAST(x.RowID AS VARCHAR(MAX)) FROM @Sample AS x WHERE x.RowID <> s.RowID ORDER BY x.RowID FOR XML PATH('')) + '#' AS Hits,
'<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
FROM @Sample AS s
WHERE s.Expense <= @WantedSum
UNION ALL
SELECT p.Total + s.Expense,
REPLACE(p.Hits, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Hits,
p.TrackPath + '<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
FROM @Sample AS s
INNER JOIN ctePack AS p ON p.Hits LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
WHERE p.Total + s.Expense <= @WantedSum
)
INSERT @Temp
(
TrackPath
)
SELECT MIN(TrackPath)
FROM ctePack
WHERE Total = @WantedSum
GROUP BY Hits
When the iterations are over, and we have the wanted combniation(s), the task left is to report the records giving us the correct sum.
We also need the records grouped so that we can see which group each expense belong to. In same cases, one and the same record may used in multiple groups.
-- Display the final resultset
SELECT t.CombID,
s.RowID,
s.Expense
FROM @Temp AS t
CROSS APPLY t.TrackPath.nodes('/ID') AS f(n)
INNER JOIN @Sample AS s ON s.RowID = f.n.value('.', 'INT')
ORDER BY t.CombID,
s.RowID
If you don't want to use XML, you can write the recursive cte like this, to get all included records directly.
-- Calculate all possible permutations using recursion
;WITH ctePack(RowID, Expense, Total, Tracker)
AS (
SELECT s.RowID,
s.Expense,
s.Expense AS Total,
(
SELECT '#' + CAST(x.RowID AS VARCHAR(MAX))
FROM @Sample AS x
WHERE x.RowID <> s.RowID
ORDER BY x.RowID
FOR XML PATH('')
) + '#' AS Tracker
FROM @Sample AS s
WHERE s.Expense <= @WantedSum
UNION ALL
SELECT s.RowID,
s.Expense,
p.Total + s.Expense,
REPLACE(p.Tracker, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Tracker
FROM @Sample AS s
INNER JOIN ctePack AS p ON p.Tracker LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
WHERE p.Total + s.Expense <= @WantedSum
)
SELECT DISTINCT
DENSE_RANK() OVER (ORDER BY Tracker) AS CombID,
RowID,
Expense
FROM ctePack
WHERE Total = @WantedSum
Friday, June 25, 2010
When I try to rename a node name such as a table or column in Management Studio, I cannot use DELETE key to remove previous characters. However, I can use BACKSPACE key.
Please vote here https://connect.microsoft.com/SQLServer/feedback/details/570758/cannot-use-delete-key-in-ssms-and-object-explorer to fix this little, but annoying, issue.
//Peso
Sunday, June 20, 2010
As some of you know, I was awarded Microsoft Most Valuable Professional (MVP) for SQL Server in July last year. Now it's time to see if I get my MVP renewed, or if I lose the award.
I honestly believe it's a good thing MVP status only lasts for one year at a time. Knowledge is fresh. Things that worked in the past may not work any longer due to evolvement, and there are smarter ways to do things now, than before.
Being an MVP is a responsibility. It does mean you have more than average knowledge of SQL Server and how to work out the best from the product. It doesn't mean you know everything about the product. Noone does.
Being an MVP means you have benefit of the community of your peers. I have learned more hardcore things only the last year, than I knew from the previous five years. And sometimes I get lucky to contribute to the MVP community myself.
That's how things work. Me being around knowledgeble people makes me learn, or understand, same things they do. The same thing apply to beginners on the forums. They most often look up to MVP's and learn one or two things from them.
The last year I haven't been around the forum as much as I want to. The reason is that I started (together with SQL server MVP Thomas Ivarsson) a PASS chapter in Sweden. It has taken a lot of time in the beginning and still does take some time to plan and invite interesting guest speakers for our meetings.
We are currently working to get a full day event in place in Sweden on the 10th of September. That's 3 months away but it feels there is not enough time! I guess every planner feels that way.
However, this is in my "line of duty", to continue to help and educate the SQL Server community,
Hopefully I get my MVP status for another year. It means my peers have faith in me and my ability to support the community.
I would like to thank a few people for the collaboration last year, which problably wouldn't have happened without my MVP status:
1) Itzik Ben-Gan for incredible insight and the email correspondence we shared the last year.
2) Joe Celko, for some fun competitions and coding help. I do love to see my [weighted] moving average in your upcoming SQL for smarties book.
3) Phil Factor, who have thought me a lot of driving and encouraging people to participate. And of course, some Speed Phreakery!
4) Tony Davis, who gave my the opportunity to be a technical reviewer.
There are a lot of other people whom I have shared ideas with. You know who you are.
I know a few people who probably will receive the MVP award this quarter. I wish you good luck and god speed!
//Peter
Thursday, June 17, 2010
Hi!
Today I am going to talk about compression in SQL Server 2008. The data warehouse I currently design and develop holds historical data back to 1973. The data warehouse will have an other blog post laster due to it's complexity.
However, the server has 60GB of memory (of which 48 is dedicated to SQL Server service), so all data didn't fit in memory and the SAN is not the fastest one around.
So I decided to give compression a go, since we use Enterprise Edition anyway.
This is the code I use to compress all tables with PAGE compression.
DECLARE @SQL VARCHAR(MAX)
DECLARE curTables CURSOR FOR
SELECT 'ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
+ '.' + QUOTENAME(OBJECT_NAME(object_id))
+ ' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'
FROM sys.tables
OPEN curTables
FETCH NEXT
FROM curTables
INTO @SQL
WHILE @@FETCH_STATUS = 0
BEGIN
IF @SQL IS NOT NULL
RAISERROR(@SQL, 10, 1) WITH NOWAIT
FETCH NEXT
FROM curTables
INTO @SQL
END
CLOSE curTables
DEALLOCATE curTables
Copy and paste the result to a new code window and execute the statements. One thing I noticed when doing this, is that the database grows with the same size as the table. If the database cannot grow this size, the operation fails.
For me, I first ended up with orphaned connection. Not good.
And this is the code I use to create the index compression statements
DECLARE @SQL VARCHAR(MAX)
DECLARE curIndexes CURSOR FOR
SELECT 'ALTER INDEX ' + QUOTENAME(name)
+ ' ON '
+ QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
+ '.'
+ QUOTENAME(OBJECT_NAME(object_id))
+ ' REBUILD PARTITION = ALL WITH (FILLFACTOR = 100, DATA_COMPRESSION = PAGE)'
FROM sys.indexes
WHERE OBJECTPROPERTY(object_id, 'IsMSShipped') = 0
AND OBJECTPROPERTY(object_id, 'IsTable') = 1
ORDER BY CASE type_desc
WHEN 'CLUSTERED' THEN 1
ELSE 2
END
OPEN curIndexes
FETCH NEXT
FROM curIndexes
INTO @SQL
WHILE @@FETCH_STATUS = 0
BEGIN
IF @SQL IS NOT NULL
RAISERROR(@SQL, 10, 1) WITH NOWAIT
FETCH NEXT
FROM curIndexes
INTO @SQL
END
CLOSE curIndexes
DEALLOCATE curIndexes
When this was done, I noticed that the 90GB database now only was 17GB. And most important, complete database now could reside in memory!
After this I took care of the administrative tasks, backups. Here I copied the code from Management Studio because I didn't want to give too much time for this. The code looks like (notice the compression option).
BACKUP DATABASE [Yoda]
TO DISK = N'D:\Fileshare\Backup\Yoda.bak'
WITH NOFORMAT,
INIT,
NAME = N'Yoda - Full Database Backup',
SKIP,
NOREWIND,
NOUNLOAD,
COMPRESSION,
STATS = 10,
CHECKSUM
GO
DECLARE @BackupSetID INT
SELECT @BackupSetID = Position
FROM msdb..backupset
WHERE database_name = N'Yoda'
AND backup_set_id =(SELECT MAX(backup_set_id) FROM msdb..backupset WHERE database_name = N'Yoda')
IF @BackupSetID IS NULL
RAISERROR(N'Verify failed. Backup information for database ''Yoda'' not found.', 16, 1)
RESTORE VERIFYONLY
FROM DISK = N'D:\Fileshare\Backup\Yoda.bak'
WITH FILE = @BackupSetID,
NOUNLOAD,
NOREWIND
GO
After running the backup, the file size was even more reduced due to the zip-like compression algorithm used in SQL Server 2008. The file size? Only 9 GB.
//Peso
Monday, June 14, 2010
SELECT u.name,
l.dbname
FROM sys.sysusers AS u
INNER JOIN sys.syslogins AS l ON l.sid = u.sid
Tuesday, June 08, 2010
-- Setup user supplied parameters
DECLARE @WantedTable SYSNAME
SET @WantedTable = 'Sales.factSalesDetail'
-- Wanted table is "parent table"
SELECT PARSENAME(@WantedTable, 2) AS ParentSchemaName,
PARSENAME(@WantedTable, 1) AS ParentTableName,
cp.Name AS ParentColumnName,
OBJECT_SCHEMA_NAME(parent_object_id) AS ChildSchemaName,
OBJECT_NAME(parent_object_id) AS ChildTableName,
cc.Name AS ChildColumnName
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.columns AS cc ON cc.column_id = fkc.parent_column_id
AND cc.object_id = fkc.parent_object_id
INNER JOIN sys.columns AS cp ON cp.column_id = fkc.referenced_column_id
AND cp.object_id = fkc.referenced_object_id
WHERE referenced_object_id = OBJECT_ID(@WantedTable)
-- Wanted table is "child table"
SELECT OBJECT_SCHEMA_NAME(referenced_object_id) AS ParentSchemaName,
OBJECT_NAME(referenced_object_id) AS ParentTableName,
cc.Name AS ParentColumnName,
PARSENAME(@WantedTable, 2) AS ChildSchemaName,
PARSENAME(@WantedTable, 1) AS ChildTableName,
cp.Name AS ChildColumnName
FROM sys.foreign_key_columns AS fkc
INNER JOIN sys.columns AS cp ON cp.column_id = fkc.parent_column_id
AND cp.object_id = fkc.parent_object_id
INNER JOIN sys.columns AS cc ON cc.column_id = fkc.referenced_column_id
AND cc.object_id = fkc.referenced_object_id
WHERE parent_object_id = OBJECT_ID(@WantedTable)
|