Thinking outside the box

Patron Saint of Lost Yaks
posts - 175, comments - 535, trackbacks - 1

My Links

Advertisement

News

Archives

Post Categories

Tuesday, July 06, 2010

More on XML and encoding

Today, let's examine encoding with SQL Server and XML datatype.


DECLARE @Inf XML

SET     @Inf =  '<?xml version="1.0" encoding="utf-16"?>
                 <root>
                    <names>
                         <name>test</name>
                     </names>
                     <names>
                         <name>test1</name>
                     </names>
                 </root>
                '

SELECT  x.value('name[1]', 'VARCHAR(10)') AS Name
FROM    @Inf.nodes('/root/names') AS t(x)


If you try to run the code above, you will get an error message like this


Msg 9402, Level 16, State 1, Line 3
XML parsing: line 1, character 39, unable to switch the encoding


Why is that? If you change the encoding to UTF-8, the code works.
The solution is to know that UTF-16 works like UNICODE, and how do we denote UNICODE strings in SQL Server? Yes, by prefixing the string with N.

So this code works with UTF-16 encoding and you can happily continue to work.

DECLARE @Inf XML

SET     @Inf = N'<?xml version="1.0" encoding="utf-16"?>
                 <root>
                    <names>
                         <name>test</name>
                     </names>
                     <names>
                         <name>test1</name>
                     </names>
                 </root>
                '

SELECT  x.value('name[1]', 'VARCHAR(10)') AS Name
FROM    @Inf.nodes('/root/names') AS t(x)

posted @ Tuesday, July 06, 2010 9:40 AM | Feedback (0) | Filed Under [ SQL Server 2008 SQL Server 2005 ]

Sunday, July 04, 2010

String concatenation and entitization

This question has been asked over and over again, and instead of having to redirect to any of my previous answers, I will cover the solution here in my blog.

Consider this sample data


DECLARE @Sample TABLE
        (
            ID INT,
            Data VARCHAR(100)
        )
 
INSERT  @Sample
VALUES  (1, 'Peso & Performance SQL'),
        (1, 'MVP'),
        (2, 'Need help <? /> -- '),
        (2, 'With XML string concatenation ?')


The "trick" is to use the TYPE directive (to deal with entitization), and then use ".value" function to get the correct value out.
So here is the final query.


SELECT      i.ID,
            STUFF(f.Content.value('.', 'VARCHAR(MAX)'), 1, 1, '')
FROM        (
                SELECT      ID
                FROM        @Sample
                GROUP BY    ID
            ) AS i
CROSS APPLY (
                SELECT  ', ' + w.Data
                FROM    @Sample AS w
                WHERE   w.ID = i.ID
                FOR XML PATH(''),
                        TYPE
            ) AS f(Content)
       
 And to deal with characters having ascii values less than 32 (space), you can use this


;WITH cteSource(ID, Content)
AS (
        SELECT      i.ID,
                    f.Content.value('.', 'NVARCHAR(MAX)')
        FROM        (
                        SELECT      ID
                        FROM        @Sample
                        GROUP BY    ID
                    ) AS i
    CROSS APPLY     (
                        SELECT  CAST(', ' + w.Data AS VARBINARY(MAX))
                        FROM    @Sample AS w
                        WHERE   w.ID = i.ID
                        FOR XML PATH(''),
                                TYPE
                    ) AS f(Content)
)
SELECT  ID,
        STUFF(CAST(Content AS NVARCHAR(MAX)), 1, 2, '') AS Content
FROM    (
            SELECT  ID,
                    CAST(N'' AS XML).value('xs:base64Binary(sql:column("Content"))', 'VARBINARY(MAX)') AS Content   
            FROM    cteSource
        ) AS d
 
 

posted @ Sunday, July 04, 2010 11:59 PM | Feedback (0) | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 ]

Friday, July 02, 2010

Proper Relational Division With Sets

I got an email from Mr Celko and he correctly stated that my previous solution was not truly working with sets, so he posted me some solutions and proper sample data.
With this information at my hand, I started to investigate what really is needed to get this Relational Division to work properly with two sets; Dividend and Divisor.

Some of you know me well, and know I am not satisfied with just solving the problem. There have to be some tweaks, and I did that too with this solution. Not only is it only touching the Dividend table once and Divisor table once, you can also set if you want a division with no remainder (which means all records in Divisor should match and not a single record more), or allow a division with remainder (which means all the records should match and maybe more records).
Great? Just set 1 for "No remainder" and 0 for "Allow remainder".

Simple as that. So why does it work? Remember your old algebra? "Divide is the same thing as multiply with the inverse number..."


Now for the sample data (courtesy of Mr Celko)


CREATE TABLE    dbo.Dividend
                (
                    group_id INTEGER NOT NULL,
                    item_name VARCHAR(10) NOT NULL,
                    PRIMARY KEY (
                                    group_id,
                                    item_name
                                )
                )

INSERT INTO     dbo.Dividend
                (
                    group_id,
                    item_name
                )
VALUES          (1, 'one'),
                (1, 'two'),
                (1, 'three'),
                (1, 'four'),
                (2, 'one'),
                (2, 'two'),
                (2, 'three'),
                (3, 'one'),
                (3, 'two')

CREATE TABLE    dbo.Divisor
                (
                    item_name VARCHAR(10) NOT NULL PRIMARY KEY
                )

INSERT INTO     dbo.Divisor
                (
                    item_name
                )
VALUES          ('one'),
                ('two'),
                ('three')


Now for the 4 solutions posted by Mr Celko


-- Celko 1
SELECT      D1.group_id
FROM        Dividend AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
GROUP BY    D1.group_id
HAVING      COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)
            AND COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Dividend AS D2 WHERE D2.group_id = D1.group_id)

-- Celko 2
SELECT      D1.group_id
FROM        Dividend AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
            AND NOT EXISTS  (
                                SELECT  *
                                FROM    Dividend AS D2
                                WHERE   D2.group_id = D1.group_id
                                        AND D2.item_name NOT IN (SELECT item_name FROM Divisor)
                            )
GROUP BY    D1.group_id
HAVING      COUNT(DISTINCT D1.item_name) = (SELECT COUNT(*) FROM Divisor)

-- Celko 3
SELECT      D1.group_id
FROM        (
                SELECT  group_id,
                        item_name,
                        COUNT(*) OVER (PARTITION BY group_id) AS cnt
                FROM    Dividend
            ) AS D1
WHERE       D1.item_name IN (SELECT item_name FROM Divisor)
            AND cnt = (SELECT COUNT(*) FROM Divisor)
GROUP BY    D1.group_id
HAVING      COUNT(D1.cnt) = (SELECT COUNT(*) FROM Divisor)

--Celko 4
;WITH Divisor2
AS (
        SELECT  group_id,
                MIN(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS single,
                SUM(CASE WHEN item_name IN (SELECT item_name FROM Divisor) THEN 1 ELSE 0 END) OVER(PARTITION BY group_id) AS full_basket
        FROM    Dividend
)
SELECT      D.group_id
FROM        Dividend AS D,
            Divisor2
WHERE       D.group_id = Divisor2.group_id
            AND Divisor2.single = 1
            AND Divisor2.full_basket = (SELECT COUNT(*) FROM Divisor)
GROUP BY    D.group_id


You can copy and paste the code to a query window and run them. Investigate the execution plan and compare the 4 of them.
And now to my solution.


-- Peso 1
SELECT      group_id
FROM        (
                SELECT      t.group_id,
                            SUM(CASE WHEN t.item_name = n.item_name THEN 1 ELSE 0 END) AS cnt,
                            COUNT(*) AS Items
                FROM        dbo.Dividend AS t
                CROSS JOIN  dbo.Divisor AS n
                GROUP BY    t.group_id,
                            t.item_name
            ) AS d
GROUP BY    group_id
HAVING      SUM(cnt) = MIN(Items)
            AND MIN(cnt) >= 1    -- 1 means no remainder, 0 means remainder


After some challenging with MVP Adam Machanic, here is another version


-- Peso v2
SELECT      t.group_id
FROM        (
                SELECT      group_id,
                            COUNT(*) AS cnt
                FROM        dbo.Dividend
                GROUP BY    group_id
            ) AS kc
INNER JOIN  (
                SELECT  COUNT(*) AS cnt
                FROM    dbo.Divisor
            ) AS nc ON nc.cnt = kc.cnt
INNER JOIN  dbo.Dividend AS t ON t.group_id = kc.group_id
INNER JOIN  dbo.Divisor AS n ON n.item_name = t.item_name
GROUP BY    t.group_id
HAVING      COUNT(*) = MIN(nc.cnt)


Here is an algorithm (exact division) which is really fast, but not 100% accurate due to the implementation of CHECKSUM (see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832).


-- Peso 3
SELECT
      group_id
FROM        (
                SELECT      group_id,
                            CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
                            CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
                FROM        dbo.Dividend
                GROUP BY    group_id
            ) AS t
INNER JOIN  (
                SELECT  CHECKSUM_AGG(CHECKSUM(item_name)) AS ca1,
                        CHECKSUM_AGG(CHECKSUM(REVERSE(item_name))) AS ca2
                FROM    dbo.Divisor
            ) AS n ON n.ca1 = t.ca1
                AND n.ca2 = t.ca2


Now copy my solutions and compare them to the other 4.


-- Celko 1
Table 'Dividend'. Scan count 3, logical reads 6.
Table 'Divisor'. Scan count 2, logical reads 20.

-- Celko 2
Table 'Divisor'. Scan count 3, logical reads 32.
Table 'Dividend'. Scan count 4, logical reads 8.

-- Celko 3
Table 'Divisor'. Scan count 3, logical reads 10.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Celko 4
Table 'Dividend'. Scan count 2, logical reads 9.
Table 'Worktable'. Scan count 3, logical reads 31.
Table 'Divisor'. Scan count 1, logical reads 38.

-- Peso 1
Table 'Divisor'. Scan count 1, logical reads 19.
Table 'Dividend'. Scan count 1, logical reads 2.

-- Peso 2
Table 'Divisor'. Scan count 1, logical reads 8.
Table 'Dividend'. Scan count 2, logical reads 4.

-- Peso 3

Table 'Divisor'. Scan count 1, logical reads 2.
Table 'Dividend'. Scan count 1, logical reads 2.

So it seems my solution is cleaner and faster than the previous existing. But the best thing is yet hidden. My solution cares for multi-column division (just expand the CASE and GROUP BY clauses) whereas the previous 4 do not. Well, not easily anyway.
It will involve some replacing for IN with EXISTS, and some string concatenation for the DISTINCT clauses.


//Peso

posted @ Friday, July 02, 2010 12:48 AM | Feedback (36) | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 SQL Server 2000 ]

Wednesday, June 30, 2010

Relational division

I came across an interesting post on Microsft SQL Server forum this afternoon. It was a question about Relational algebra and the poster wanted to have an efficient query to solve hos problem.
The problem could be solved with relational division, but there is no such operator in SQL Server. Maybe there will be same day.
For a fully working solution, see http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx

But for now there is no such operator, so we as developers have to find our own ways.
First prepare and populate some sample data


-- Prepare sample data
DECLARE @Sample TABLE
        (
            ParentID INT NOT NULL,
            Keyword VARCHAR(25) NOT NULL,
            UNIQUE (ParentID, Keyword)
        )

-- Populate sample data
INSERT  @Sample
        (
            ParentID,
            Keyword
        )
VALUES  (1, 'one'),
        (1, 'two'),
        (1, 'three'),
        (1, 'four'),
        (2, 'one'),
        (2, 'two'),
        (2, 'three'),
        (3, 'one'),
        (3, 'two')


People had already been active and posted some solutions, of which this common query was present.


SELECT      s.ParentID
FROM        @Sample AS s
WHERE       s.Keyword IN ('one', 'two', 'three')
GROUP BY    s.ParentID
HAVING      COUNT(DISTINCT s.Keyword) = 3
            AND COUNT(DISTINCT s.Keyword) = (SELECT COUNT(*) FROM @Sample AS x WHERE x.ParentID = s.ParentID)


and this type of query


SELECT      s.ParentID
FROM        @Sample AS s
WHERE       s.Keyword IN ('one', 'two', 'three') 
            AND NOT EXISTS (
                            SELECT  *
                            FROM    @Sample AS x
                            WHERE   x.ParentID = s.ParentID
                                    AND x.Keyword NOT IN ('one', 'two', 'three')
                           )
GROUP BY    s.ParentID
HAVING      COUNT(DISTINCT s.Keyword) = 3


And even a XML query!


;WITH AggStr
AS (
        SELECT  ParentId,
                (
                    SELECT      CAST(',' AS VARCHAR(MAX)) + c.Keyword
                    FROM        @Sample AS c
                    WHERE       c.ParentID = p.ParentID
                    ORDER BY    c.Keyword
                    FOR XML     PATH('')
                ) AS c1
        FROM    (
                    SELECT  DISTINCT
                            ParentID
                    FROM    @Sample
                ) AS p
)
SELECT  ParentID
FROM    AggStr
WHERE   c1 = ',one,three,two'


The good thing is that all three produce the same wanted result but the bad thing is the inefficient execution plans. Then one poster did his homework and read about Mr Celko and translated his algorithm to the current problem, and then the query looked like this


SELECT      ParentID
FROM        (
                SELECT  ParentID,
                        Keyword,
                        COUNT(*) OVER (PARTITION BY ParentID) AS cnt
                FROM    @Sample
            ) AS w
WHERE       Keyword IN ('one', 'two', 'three')
            AND cnt = 3
GROUP BY    ParentID
HAVING      COUNT(cnt) = 3


With these queries in mind, I thought about the problem and realized the problem did in fact have a much simpler solution.
The query I came up with is the simplest of them all, and just does one pass of the source table. Yes, only one pass just as the first Celko query for relational division, but without the internal worktable.
This is the query I came up with


-- Peso
SELECT      ParentID
FROM        @Sample
GROUP BY    ParentID
HAVING      MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1
            AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3


How does the query work? The second aggregation filtering just makes sure all three keywords are present.
But the first aggregation filter? What does it do? To simplify, I just write that it takes care of the modulo part of the relational division. There cannot be a "fractional" part of the relational division, because it means that particular ParentID has more keywords than wanted.

Simple as that.

//Peso


PS. These are the textual execution plans for the four types of queries and then mine.


  |--Filter(WHERE:([Expr1003]=CASE WHEN [Expr1007] IS NULL THEN (0) ELSE [Expr1007] END))
       |--Nested Loops(Left Outer Join, OUTER REFERENCES:([s].[ParentID]))
            |--Filter(WHERE:([Expr1003]=(3)))
            |    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0)))
            |         |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1014]=Count(*)))
            |              |--Index Scan(OBJECT:(@Sample AS [s]),  WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD)
            |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0)))
                 |--Stream Aggregate(DEFINE:([Expr1015]=Count(*)))
                      |--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]) ORDERED FORWARD)


  |--Filter(WHERE:([Expr1007]=(3)))
       |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1010],0)))
            |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1010]=Count(*)))
                 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([s].[ParentID]))
                      |--Index Scan(OBJECT:(@Sample AS [s]),  WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD)
                      |--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]),  WHERE:(@Sample.[Keyword] as [x].[Keyword]<>'one' AND @Sample.[Keyword] as [x].[Keyword]<>'three' AND @Sample.[Keyword] as [x].[Keyword]<>'two') ORDERED FORWARD)


 |--Filter(WHERE:([Expr1008]=N',one,three,two'))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([ParentID]))
            |--Stream Aggregate(GROUP BY:([ParentID]))
            |    |--Index Scan(OBJECT:(@sample), ORDERED FORWARD)
            |--UDX(([Expr1007], [C].[Keyword]))
                 |--Compute Scalar(DEFINE:([Expr1007]=CONVERT(varchar(max),',',0)+@sample.[Keyword] as [C].[Keyword]))
                      |--Index Seek(OBJECT:(@sample AS [C]), SEEK:([C].[ParentID]=[ParentID]) ORDERED FORWARD)


   |--Filter(WHERE:([Expr1005]=(3)))
       |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1010],0)))
            |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1010]=Count(*)))
                 |--Filter(WHERE:(([Keyword]='one' OR [Keyword]='three' OR [Keyword]='two') AND [Expr1004]=(3)))
                      |--Nested Loops(Inner Join)
                           |--Table Spool
                           |    |--Segment
                           |         |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)
                           |--Nested Loops(Inner Join, WHERE:((1)))
                                |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1009],0)))
                                |    |--Stream Aggregate(DEFINE:([Expr1009]=Count(*)))
                                |         |--Table Spool
                                |--Table Spool


  |--Filter(WHERE:([Expr1004]=(1) AND [Expr1005]=(3)))
       |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1004]=MIN([Expr1006]), [Expr1005]=SUM([Expr1006])))
            |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Keyword]='three' OR [Keyword]='two' OR [Keyword]='one' THEN (1) ELSE (0) END))
                 |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)

 

posted @ Wednesday, June 30, 2010 9:41 PM | Feedback (1) | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 SQL Server 2000 ]

Another bin-packaging algorithm using recursion and XML

This time I will show you an algorithm to do the dreaded bin-packaging using recursion and XML.
First, create some sample data like this

-- Prepare sample data
DECLARE @Sample TABLE
        (
            RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
            Expense SMALLMONEY NOT NULL
        )
 
-- Populate sample data
INSERT  @Sample
        (
            Expense
        )
VALUES  (12.51),
        (45.63),
        (66.35),
        (92.66),
        (65.46),
        (54.01),
        (32.23),
        (27.16),
        (78.92),
        (14.58)
 
Next, we need to create a variable to hold the user's wanted total sum.

-- Prepare user supplied parameter
DECLARE @WantedSUM SMALLMONEY = 111.09

And we also need to create a temporary staging table to hold the valid combinations
 
-- Prepare temporary staging table
DECLARE @Temp TABLE
        (
            CombID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
            TrackPath XML NOT NULL
        )

Now we only have to do the calculations!
Here I am using a special trick to get the unique combination, since the path of records 1>2>3 is the same as 1>3>2, 2>1>3, 2>3>1, 3>1>2 and 3>2>1. See explanation between combination and permutation at Wikipedia here.
To keep track of which records I already have used in the total sum, I simply remove the record id (RowID) from the Hits list.
And, to give the correct answer at the end, I build a XML string with visited RowID's building up the correct sum.
 
-- Calculate all possible permutations using recursion
;WITH ctePack(Total, Hits, TrackPath)
AS (
        SELECT  s.Expense AS Total,
                (SELECT '#' + CAST(x.RowID AS VARCHAR(MAX)) FROM @Sample AS x WHERE x.RowID <> s.RowID ORDER BY x.RowID FOR XML PATH('')) + '#' AS Hits,
                '<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
        FROM    @Sample AS s
        WHERE   s.Expense <= @WantedSum
 
        UNION ALL
 
        SELECT      p.Total + s.Expense,
                    REPLACE(p.Hits, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Hits,
                    p.TrackPath + '<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
        FROM        @Sample AS s
        INNER JOIN  ctePack AS p ON p.Hits LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
        WHERE       p.Total + s.Expense <= @WantedSum
)
INSERT      @Temp
            (
                TrackPath
            )
SELECT      MIN(TrackPath)
FROM        ctePack
WHERE       Total = @WantedSum
GROUP BY    Hits

When the iterations are over, and we have the wanted combniation(s), the task left is to report the records giving us the correct sum.
We also need the records grouped so that we can see which group each expense belong to. In same cases, one and the same record may used in multiple groups.
 
-- Display the final resultset
SELECT      t.CombID,
            s.RowID,
            s.Expense
FROM        @Temp AS t
CROSS APPLY t.TrackPath.nodes('/ID') AS f(n)
INNER JOIN  @Sample AS s ON s.RowID = f.n.value('.', 'INT')
ORDER BY    t.CombID,
            s.RowID

If you don't want to use XML, you can write the recursive cte like this, to get all included records directly.

-- Calculate all possible permutations using recursion
;WITH ctePack(RowID, Expense, Total, Tracker)
AS (
        SELECT  s.RowID,
                s.Expense,
                s.Expense AS Total,
                (
                    SELECT      '#' + CAST(x.RowID AS VARCHAR(MAX))
                    FROM        @Sample AS x
                    WHERE       x.RowID <> s.RowID
                    ORDER BY    x.RowID
                    FOR
XML     PATH('')
                ) + '#' AS Tracker
        FROM    @Sample AS s
        WHERE   s.Expense <= @WantedSum

        UNION ALL

        SELECT      s.RowID,
                    s.Expense,
                    p.Total + s.Expense,
                    REPLACE(p.Tracker, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Tracker
        FROM        @Sample AS s
        INNER JOIN  ctePack AS p ON p.Tracker LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
        WHERE       p.Total + s.Expense <= @WantedSum
)
SELECT  DISTINCT
        DENSE_RANK() OVER (ORDER BY Tracker) AS CombID,
        RowID,       
        Expense
FROM    ctePack
WHERE   Total = @WantedSum

posted @ Wednesday, June 30, 2010 1:44 PM | Feedback (0) | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 ]

Friday, June 25, 2010

Bug in SQL Server Management Studio

When I try to rename a node name such as a table or column in Management Studio, I cannot use DELETE key to remove previous characters. However, I can use BACKSPACE key.

Please vote here https://connect.microsoft.com/SQLServer/feedback/details/570758/cannot-use-delete-key-in-ssms-and-object-explorer to fix this little, but annoying, issue.

//Peso

posted @ Friday, June 25, 2010 10:13 AM | Feedback (0) | Filed Under [ SQL Server 2008 Administration ]

Sunday, June 20, 2010

Ten days left

As some of you know, I was awarded Microsoft Most Valuable Professional (MVP) for SQL Server in July last year. Now it's time to see if I get my MVP renewed, or if I lose the award.
I honestly believe it's a good thing MVP status only lasts for one year at a time. Knowledge is fresh. Things that worked in the past may not work any longer due to evolvement, and there are smarter ways to do things now, than before.

Being an MVP is a responsibility. It does mean you have more than average knowledge of SQL Server and how to work out the best from the product. It doesn't mean you know everything about the product. Noone does.
Being an MVP means you have benefit of the community of your peers. I have learned more hardcore things only the last year, than I knew from the previous five years. And sometimes I get lucky to contribute to the MVP community myself.

That's how things work. Me being around knowledgeble people makes me learn, or understand, same things they do. The same thing apply to beginners on the forums. They most often look up to MVP's and learn one or two things from them.

The last year I haven't been around the forum as much as I want to. The reason is that I started (together with SQL server MVP Thomas Ivarsson) a PASS chapter in Sweden. It has taken a lot of time in the beginning and still does take some time to plan and invite interesting guest speakers for our meetings.

We are currently working to get a full day event in place in Sweden on the 10th of September. That's 3 months away but it feels there is not enough time! I guess every planner feels that way.
However, this is in my "line of duty", to continue to help and educate the SQL Server community,

Hopefully I get my MVP status for another year. It means my peers have faith in me and my ability to support the community.
I would like to thank a few people for the collaboration last year, which problably wouldn't have happened without my MVP status:

1) Itzik Ben-Gan for incredible insight and the email correspondence we shared the last year.
2) Joe Celko, for some fun competitions and coding help. I do love to see my [weighted] moving average in your upcoming SQL for smarties book.
3) Phil Factor, who have thought me a lot of driving and encouraging people to participate.  And of course, some Speed Phreakery!
4) Tony Davis, who gave my the opportunity to be a technical reviewer.

There are a lot of other people whom I have shared ideas with. You know who you are.

I know a few people who probably will receive the MVP award this quarter. I wish you good luck and god speed!

//Peter

posted @ Sunday, June 20, 2010 9:21 AM | Feedback (0) | Filed Under [ Miscellaneous ]

Thursday, June 17, 2010

SQL Server 2008 Compression

Hi!

Today I am going to talk about compression in SQL Server 2008. The data warehouse I currently design and develop holds historical data back to 1973. The data warehouse will have an other blog post laster due to it's complexity.

However, the server has 60GB of memory (of which 48 is dedicated to SQL Server service), so all data didn't fit in memory and the SAN is not the fastest one around.
So I decided to give compression a go, since we use Enterprise Edition anyway.

This is the code I use to compress all tables with PAGE compression.


DECLARE @SQL VARCHAR(MAX)
 
DECLARE curTables CURSOR FOR
            SELECT 'ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
                    + '.' + QUOTENAME(OBJECT_NAME(object_id))
                    + ' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'
            FROM    sys.tables
 
OPEN    curTables
 
FETCH   NEXT
FROM    curTables
INTO    @SQL
 
WHILE @@FETCH_STATUS = 0
    BEGIN
        IF @SQL IS NOT NULL
            RAISERROR(@SQL, 10, 1) WITH NOWAIT
 
        FETCH   NEXT
        FROM    curTables
        INTO    @SQL
    END
 
CLOSE       curTables
DEALLOCATE  curTables


Copy and paste the result to a new code window and execute the statements. One thing I noticed when doing this, is that the database grows with the same size as the table. If the database cannot grow this size, the operation fails.
For me, I first ended up with orphaned connection. Not good.

And this is the code I use to create the index compression statements


DECLARE @SQL VARCHAR(MAX)
 
DECLARE curIndexes CURSOR FOR
            SELECT      'ALTER INDEX ' + QUOTENAME(name)
                        + ' ON '
                        + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
                        + '.'
                        + QUOTENAME(OBJECT_NAME(object_id))
                        + ' REBUILD PARTITION = ALL WITH (FILLFACTOR = 100, DATA_COMPRESSION = PAGE)'
            FROM        sys.indexes
            WHERE       OBJECTPROPERTY(object_id, 'IsMSShipped') = 0
                        AND OBJECTPROPERTY(object_id, 'IsTable') = 1
            ORDER BY    CASE type_desc
                            WHEN 'CLUSTERED' THEN 1
                            ELSE 2
                        END
 
OPEN    curIndexes
 
FETCH   NEXT
FROM    curIndexes
INTO    @SQL
 
WHILE @@FETCH_STATUS = 0
    BEGIN
        IF @SQL IS NOT NULL
            RAISERROR(@SQL, 10, 1) WITH NOWAIT
 
        FETCH   NEXT
        FROM    curIndexes
        INTO    @SQL
    END
 
CLOSE       curIndexes
DEALLOCATE  curIndexes


When this was done, I noticed that the 90GB database now only was 17GB. And most important, complete database now could reside in memory!

After this I took care of the administrative tasks, backups. Here I copied the code from Management Studio because I didn't want to give too much time for this. The code looks like (notice the compression option).


BACKUP DATABASE [Yoda]
TO              DISK = N'D:\Fileshare\Backup\Yoda.bak'
WITH            NOFORMAT,
                INIT,
                NAME = N'Yoda - Full Database Backup',
                SKIP,
                NOREWIND,
                NOUNLOAD,
                COMPRESSION,
                STATS = 10,
                CHECKSUM
GO
 
DECLARE @BackupSetID INT
 
SELECT  @BackupSetID = Position
FROM    msdb..backupset
WHERE   database_name = N'Yoda'
        AND backup_set_id =(SELECT MAX(backup_set_id) FROM msdb..backupset WHERE database_name = N'Yoda')
 
IF @BackupSetID IS NULL
    RAISERROR(N'Verify failed. Backup information for database ''Yoda'' not found.', 16, 1)
 
RESTORE VERIFYONLY
FROM    DISK = N'D:\Fileshare\Backup\Yoda.bak'
WITH    FILE = @BackupSetID,
        NOUNLOAD,
        NOREWIND
GO


After running the backup, the file size was even more reduced due to the zip-like compression algorithm used in SQL Server 2008. The file size? Only 9 GB.

//Peso

posted @ Thursday, June 17, 2010 2:16 PM | Feedback (2) | Filed Under [ Optimization SQL Server 2008 Administration ]

Monday, June 14, 2010

Simple script to get logins and default database names

SELECT     u.name,
           l.dbname
FROM       sys.sysusers AS u
INNER JOIN sys.syslogins AS l ON l.sid = u.sid

posted @ Monday, June 14, 2010 1:27 PM | Feedback (0) | Filed Under [ SQL Server 2008 Administration SQL Server 2005 ]

Tuesday, June 08, 2010

Simple script to get referenced table and their column names

-- Setup user supplied parameters
DECLARE @WantedTable SYSNAME
 
SET     @WantedTable = 'Sales.factSalesDetail'
 
-- Wanted table is "parent table"
SELECT      PARSENAME(@WantedTable, 2) AS ParentSchemaName,
            PARSENAME(@WantedTable, 1) AS ParentTableName,
            cp.Name AS ParentColumnName,
            OBJECT_SCHEMA_NAME(parent_object_id) AS ChildSchemaName,
            OBJECT_NAME(parent_object_id) AS ChildTableName,
            cc.Name AS ChildColumnName
FROM        sys.foreign_key_columns AS fkc
INNER JOIN  sys.columns AS cc ON cc.column_id = fkc.parent_column_id
                AND cc.object_id = fkc.parent_object_id
INNER JOIN  sys.columns AS cp ON cp.column_id = fkc.referenced_column_id
                AND cp.object_id = fkc.referenced_object_id
WHERE       referenced_object_id = OBJECT_ID(@WantedTable)
 
-- Wanted table is "child table"
SELECT      OBJECT_SCHEMA_NAME(referenced_object_id) AS ParentSchemaName,
            OBJECT_NAME(referenced_object_id) AS ParentTableName,
            cc.Name AS ParentColumnName,
            PARSENAME(@WantedTable, 2) AS ChildSchemaName,
            PARSENAME(@WantedTable, 1) AS ChildTableName,
            cp.Name AS ChildColumnName
FROM        sys.foreign_key_columns AS fkc
INNER JOIN  sys.columns AS cp ON cp.column_id = fkc.parent_column_id
                AND cp.object_id = fkc.parent_object_id
INNER JOIN  sys.columns AS cc ON cc.column_id = fkc.referenced_column_id
                AND cc.object_id = fkc.referenced_object_id
WHERE       parent_object_id = OBJECT_ID(@WantedTable)
 

posted @ Tuesday, June 08, 2010 4:02 PM | Feedback (0) | Filed Under [ SQL Server 2008 Administration SQL Server 2005 SQL Server 2000 ]

Powered by:
Powered By Subtext Powered By ASP.NET