Posts
38
Comments
50
Trackbacks
26
Thursday, July 31, 2008
Next version of DTLoggedExec: Dataflow Profiling in sight!

It's been a long time since I've been able to work consistenly on my DTLoggedExec tool. Fortunately in these last days I've been able to spare some free time to continue the work, and now, within a few days in September 2008 I should be able realease a new version of it.

This new version will contain a new (and imho very very important) feature: DTLoggedExec will be able to profile DataFlow executions, generating a CVS-like file where all the informations on how many rows have been processed by each dataflow component will be available.

You can see an example of this file here:

http://www.davidemauri.it/files/Test-Package4.dtsprofile.txt

I suggest to download it so that you can start to play and see how many things you can do with all that informations.

Here is the Package used as a sample: http://www.davidemauri.it/files/Test-Package4.zip

Of course with such file it's very very very very easy to import everything in a SSIS Performance Database that you can create for that purpose, so that you can do query like this one:

Query 

With that you can aggregate data and monitor how your package is performing on daily basis.

These are the data currently avaiable in the .dtsprofile file:

DataFlowId, 
DataFlowName, 
ExecutionNumber,
TransformationId, 
TransformationName,
SourceId, 
SourceName,
DestinationId, 
DestinationName,
RowCount

DTLoggedExec is capable of handling correctly complex packages situations, if you have a DataFlow inside a loop, it detects it and populate the "ExecutionNumber" value accordingly, so that you can profile dataflow execution data for each iteration. Of course it also provides a timestamp so that you can calculate how many rows are processed per minute (o per second), creating a graph where you can see the trendline of your package execution times agains the number of processed rows (for example).

Basically you can do now some serious performance trendline analysis of packages without having to instrument the package itself. All the data you need will be extracted by DTLoggedExec itself, right from the SSIS Engine.

I now really feel that DTLoggedExec is really what DTExec should have been right from the start.

Of course DTLoggedExec will work also with SQL Server 2008 :-)

posted @ Thursday, July 31, 2008 11:42 AM | Feedback (0)
Wednesday, June 18, 2008
Using SQL Server PowerShell Snapin with other Powershell shells

SQL Server 2008 has a strong integration with PowerShell but actually it also provide a custom shell, SQLPS, which is somehow limited for me. Well actually is limited more in general speaking, since it is called "minishell".

"SQLPS.exe is a Minishell (also called “custom shell”). It is a form of pre-packaging of Powershell functionality, and it is available to anyone who wants to do this (make-shell). It is regular Powershell, albeit with limitations that the Powershell team decided to impose on it – it is a ‘closed’ shell, which doesn’t allow adding other snapins.

We are shipping SQLPS to make life of our DBA’s a whole lot easier. If they need to have quick access to the SQL providers, assemblies, cmdlets, default security settings, everything is there. We could have possibly done this through a startup script but not everything can be accomplished this way. We are changing the default security settings, without affecting the settings for overall Powershell. Minishells have their own settings."

http://blogs.msdn.com/mwories/archive/2008/06/14/SQL2008_5F00_Powershell.aspx

Of course it give PowerShell access right out of the box and requires no configuration, but I need something more developer friendly when I have to write complex powershell scripts.

More in generale, when I use PowerShell I'd like to use my preferred IDE (actually PowerGUI). Is it possible to configure any shell other than SQLPS to have the same behaviour of that one? Fortunately yes!

Is just a matter of adding the snapins so that we can use the SQL: drives

# Manual procedure
Get-PSSnapin -registered
Add-PSSnapin SqlServerCmdletSnapin
Add-PSSnapin SqlServerProviderSnapin

we can also automatize everything:

# Verbose command
Get-PSSnapin -Registered | Where-Object { $_.Name -like "SqlServer*" } | ForEach-Object { Write-Host "Adding Snapin " + $_.Name; Add-PSSnapin $_.Name }

# Contracted version:
# gsnp -Registered  | ? { $_.Name -like "SqlServer*" } | % { "Adding Snapin " + $_.Name; asnp $_.Name }

then we also need to make sure that the default formating when dealing with SQL Server object is the one used by SQLPS. Again all we need to do is just register formating and format type data:

# Load Formatting Types Data
Update-TypeData "C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLProvider.Types.ps1xml"
Update-FormatData "C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLProvider.Format.ps1xml"

That's it! We can now enjoy SQL Server through our preferred PowerShell shell! SQL PSDrive, CmdLets and everything is ready to be used :-)

posted @ Wednesday, June 18, 2008 11:32 AM | Feedback (0)
Wednesday, June 27, 2007
SQL Server Best Practices and Performance Tuning for Oracle Enterprise Applications Workshop

In the last days Frank McBath send me an interesting email regarding a free workshop that will be delivered in Zurich on the 10 and 11 of July.

The workshop is really intersting, as you may see by yourself:

"There are many courses in the market that cover either the application or the database, but very little that focuses on how the two work together. The goal of this workshop is provide insight into that area-- where the database meets the enterprise application. With examples from Siebel, PeopleSoft, JD Edwards and SAP."

Info and registration here:

http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032342993&Culture=de-CH

Don't miss it if you have that chance to go.

Other schedules and location can be found here:

http://www.microsoft-oracle.com/Pages/default.aspx

posted @ Wednesday, June 27, 2007 3:20 PM | Feedback (0)
Tuesday, June 05, 2007
DTLoggedExec v 0.1.3.2 Released

After a while I've realeased a new version of DTLoggedExec, the DTExec replacement:

  • Compiled to support 32bit and 64bit platforms (32bit and 64bit executables are available in the zipped file)
  • Updated the ConsoleLogProvider to add a more detailed logging of OnError events. Now all properties, along with related connection properties of the erroneous task are logged. This is IDEAL for post-mortem debugging
  • Corrected a little bug that prevented the abilility to load packages from SQL Server using SQL Authentication
  • Display loaded package version
  • Added Help File
  • Added Samples Packages
  • Added Usage Samples
  • More info and download on CodePlex!

     

    posted @ Tuesday, June 05, 2007 9:27 PM | Feedback (0)
    Sunday, November 12, 2006
    Useful index scripts

    At this address

    http://www.davidemauri.it/SqlScripts.aspx

    i've released four scripts that simplify the usage of SQL Server 2005 DMVs to discover, analyze and understand indexes usage and health.

    They are nothing exceptional, juts a shortcut to common (but long to write)
    queries.

    I use them a lot, maybe they're of some use to other DBAs :-)

    posted @ Sunday, November 12, 2006 6:07 PM | Feedback (1)
    Monday, November 06, 2006
    DTLoggedExec v 0.1.2.0 released!

    Finally I've realeased a new version of my DTLoggedExec tool.

    A lots of improvement from the first public version:

    -Switched to a fully pluggable architecture to support 3rd party plugins
    -Added "LogEvent" command line switch to choose which events should be logged
    -Added support for Expression logging
    -Added "Package Name" and "Container ID" values in the console log

    Download it from CodePlex!

    posted @ Monday, November 06, 2006 5:34 PM | Feedback (1)
    Wednesday, July 05, 2006
    DTLoggedExec can now log even DTS Expressions!

    Today I worked again on DTLoggedExec. I've been able to add the ability to log properties with DTS Expressions associated. This allows the log to show and report properties values in additions to the already present ablity to log variables values.

    I'm sure you'll find this feature really helpful if you're using DTS Expression (as you should do) to add flexibility to your SSIS Packages.

    The DTLoggedExec that will contain this new feature will be the version 0.1.1.0.

    posted @ Wednesday, July 05, 2006 3:39 PM | Feedback (0)
    Wednesday, June 28, 2006
    DTLoggedExec page online

    I've created a dedicated page for my DTLoggedExec tool.

    http://www.davidemauri.it/dtloggedexec.aspx

    By the end of the month (June) i'll put online the full source code. The code and the application are licensed under Creative Commons Attribution-NonCommercial-NoDerivs 2.5 Italy License.

    The "NoDerives" licensing limitation just means that if you want to contribute to the project you have to join codeplex and/or contact me so that I can add you to official authors list.

    The "NonCommercial" licensing limitation just means that you cannot take the code and the executable and sell it. Of course, if you want to use it in your production eviroment, that's fine.

    posted @ Wednesday, June 28, 2006 9:21 AM | Feedback (2)
    Saturday, June 03, 2006
    SSIS Logging: DTLoggedExec news

    I've been able to work a little on my DTLoggedExec project in this last week. You can find a log execution result here:

    http://www.davidemauri.it/files/DTLoggedExec_test_output.txt

    DTLoggedExec now supports some of the original DTExec options and is able to log even variables values, as you may see from the link above. DTLoggedExec will also support a plugin architecture so that creating a custom log provider will be as easy as 1-2-3. Currently developed log provider are:

    • NullLogProvider
    • SqlServerLogProvider
    • ConsoleLogProvider

    The Null log provider will just trash every information it gets. I'm using it to benchmark the impact of other providers on package execution. SqlServerProvider will send all information to a sysdtslog902 table. Note that the table is NOT the original sysdtslog90, since DTLoggedExec can log A LOT more information than the orginala tool. The ConsoleLogProvider will just send data to the console. This is the log provider I used to create the test_output.txt file that I put on my website.

    I've also a great news. I've decided to release the code under the Creative Commons license, and I'll release the project on CodePlex (I've just requested to open a new project for that) for the end of this month.

    Of course any comments and feedback is appreciated!

    posted @ Saturday, June 03, 2006 5:53 PM | Feedback (0)
    Wednesday, May 17, 2006
    SSIS Logging: DTLoggedExec

    After using a SSIS for quite a lot I decided to develop a DTExec on my own. The reasons can be found in the feature set that I have I mind and I'd like to implement asap:

    1. Ability to switch on/off logging from the execution utility, not only from inside the package (and thus only using Visual Studio)
    2. Avoid to use native log provider and add a personalized concept of logging provider where you can log a LOT more of information. Above all I'm interested in logging variables values for each trappable event. This will make things a lot more easier when you have to understand why a package didn't run correctly.
    3. Ability to save information on task hosts relationships, so that you can understand clearly if a task is inside a container, or if a package is called from another package. All this information must go to the log store so that you can see it in a graphical format each time you need it.
    4. Ability to specify to log ANY of the properties you have in package and package's containers. In this way you can see, for example, how an expression has modified a property.

    After a little development and after having solved some problems (mostly due to lack of information in documentation), I've got something that is useful. I've wrote a console application called DTLoggedExec that behaves like DTExec (though is not yet 1:1 compatible with its command options). For now it "just" allows you to decide what kind of Log Provider to use and to log variables values.

    Log providers ARE NOT the log providers shipped with SSIS, but are new classes written from the scratch; they allow the DTLoggedExec to support a plugin architecure.

    This is a sample screenshot where you can see the variables logged to the console.

    As soon as I'll have other news I'll post it ASAP.

    Just one thing. I've not jet decided if I'll realese this tool for free or not, so, for now, no executable and no sources are avaiable (Mostly beacause the product, though already used in some production enviroment, is an early alpha version).

    posted @ Wednesday, May 17, 2006 6:18 PM | Feedback (0)
    Thursday, April 20, 2006
    Partitioning: Two useful query

    To obtain all information on how a table has been partitioned I usually use two views:

    The first just list all the partitened objects, showing the used partitioned scheme and function:

    CREATE view [sys2].[partitioned_objects]
    as
    select
    distinct
       
    p.object_id,
       
    index_name = i.name,
       index_type_desc = i.type_desc,
       partition_scheme = ps.name,
       data_space_id = ps.data_space_id,
       function_name = pf.name,
       function_id = ps.function_id
    from 
       sys.partitions p
    inner join
       sys.indexes i on p.object_id = i.object_id and p.index_id = i.index_id
    inner join
       sys.data_spaces ds on i.data_space_id = ds.data_space_id
    inner join
       sys.partition_schemes ps on ds.data_space_id = ps.data_space_id
    inner join
       sys.partition_functions pf on ps.function_id = pf.function_id

    That sys2.partitioned_object views is very useful to see how a table has been partitioned:

    select * from sys2.partitioned_objects where object_id = object_id('table_name')

    with the above query will list table and related indexes partition information.

    Once you know what partition scheme and function the table and related indeexes uses for partitioning you may also what to see how many partition actually exists, what range values are used and so on.

    This view will help you:

    create view [sys2].[partitioned_objects_range_values]
    as
    select
     
       
    p.object_id,
       p.index_id,
       p.partition_number,
       p.rows,
       index_name = i.name,
       index_type_desc = i.type_desc,
       i.data_space_id,
       pf.function_id,
       pf.type_desc,
       pf.boundary_value_on_right,
       destination_data_space_id = dds.destination_id,
       prv.parameter_id,
       prv.value
    from
     
       sys.partitions p
    inner join
       sys.indexes i on p.object_id = i.object_id and p.index_id = i.index_id
    inner join
       sys.data_spaces ds on i.data_space_id = ds.data_space_id
    inner join
       sys.partition_schemes ps on ds.data_space_id = ps.data_space_id
    inner join
       sys.partition_functions pf on ps.function_id = pf.function_id
    inner join
       sys.destination_data_spaces dds on dds.partition_scheme_id = ds.data_space_id and p.partition_number = dds.destination_id
    left outer join
       sys.partition_range_values prv on prv.function_id = ps.function_id and p.partition_number = prv.boundary_id

    To use it, just use as the one showed before:

    select * from sys2.partitioned_objects_range_values where object_id = object_id('table_name')

    The query will produce a list with all partitions and all range values for the object 'table_name'

    I've put my "system" views into a schema named sys2, so be sure to create it or change that create view statement to create views in your own schema.

    posted @ Thursday, April 20, 2006 3:50 PM | Feedback (0)
    Sunday, April 02, 2006
    Logging With SSIS: Beware of Parent Package Variable configuration!

    In my previous post I said that "logging begins BEFORE validation and BEFORE package configuration". After talking with Mr. Kirk Haselden (who has been very kind and helped me to better understand how SSIS really works. Thanks a lot Kirk!) I found a clue and a workaround.

    The problem, basically, is related to the fact that the configuration "Parent Package Variable" behaves differently from all other configurations. In fact, for the Parent Package Variable configuration, events are fired in the following way:

    Logging starts
    Package is validated
    Parent Package Variable Configurations are loaded
    Expression are applied
    Package is run

    which is a different from all other configuration where the configuration is the first event. So, if you need to make the logging connection manager configurable at run time you MUST avoid using Parent Package Variables and you should use the Enviroment Variables instead.

    posted @ Sunday, April 02, 2006 10:50 AM | Feedback (2)
    Monday, March 27, 2006
    Logging With SSIS: 3 things to know

    I'm using SSIS in a big project and as any big project wants, I have to implement a quite extensive logging mechanism. "No problem" you may say. We'll, at least is what I said when I began to take care of logging. Unfortunately things are not *so* easy. To be realistic they're not easy at all.

    Let me explain. I have a *lots* of SSIS Packages and many of them just encapsulate reusable logic; as such they are used as sub-packages in other packages. Things are quite complicated so I have a "Matrioska" situation where Packages call Packages that call Packages and so on.

    Here comes the first problems:

    1) Native logging doesn't seems to work for packages up from the third nesting level if you leave the default logging mode to "UseParentSettings". Be aware with "UseParentSettings" logging mode: you have to configure the package that uses this option the SAME AS if it has LoggingMode set to "True". So you have to configure a connection for logging and to specify and enable the log provider.

    2) With native package logging you can't understand whether a task is contained in a bigger container or not. This is not a vital problem but it would have help to create more precise reports.

    Now, let's also say that you have a package that must be executed against multiple databases. For example you have the database Customer1, Customer2 and so on, all with the same schema but with different data. You're surely going to create una SISS Package that has the ability to configure itself automatically so that it will run targeting different database basing on a very cool SSIS Configuration file. Since the package is REALLY big and create a LOT of logging you'll also think to create a different log database for any customer. So you'll have LOG_Customer1, LOG_Customer2 and so on. And here's the problem

    3) If you want to log using a SQL Server Log Provider you CAN'T make the Sql Server connection dynamic (with the aid of an Expression) since logging begins BEFORE validation and BEFORE package configuration. This means that the logging with initally log on the database you have specified in the ConnectionString property of your Sql Server Connection, and only after package configuration has taken place will switch to log on your dynamicaly configured database.
    The problem is that the database you configure in the ConnectionString statically may not exists, since you know that the value will be corrected at run-time. Unfurtunately this won't help if you need to run this package from another package since logging to a non-existent database will raise an error. The error is strange enough because won't make you inner package execution fail, but will only fail the outer package, leaving you with a "Executed successfully with error" situation.

    To solve these problems i've started to create a DTExec on my own where I can control logging manually and from OUTSIDE the package.

    As soon I have other information or solution i'll post them asap, in the meantime if someone has faced the same problems i'd like to hear his experience.

    posted @ Monday, March 27, 2006 10:26 PM | Feedback (2)
    Friday, March 17, 2006
    A better NTILE implementation

    The implementation of the NTILE(n) windowing function is a little bit slow and requires a temp worktable that generates A LOT of I/O. Probably that because, as the BOL says, if you have a number of rows that is not divisible by your "n" bucket value, the function has to make the "Larger groups come before smaller groups in the order specified by the OVER clause".

    If you're using NTILE for statistical purposes and so you don't care about having larger groups before smaller one, mostly because the difference among the groups population tipically will be of only one unit, you can implement NTILE with the following script:

    SELECT 
       CustomerKey,
       CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
    FROM 
       
    DimCustomer

    where @n is a variable that contains your bucket value. For example, if you want a NTILE(5) your @n value will be 5.

    You can make some test using the AdventureWorksDW database:

    DECLARE @n INT;
    SET @n = 5;

    WITH CTE AS (
       SELECT 
          CustomerKey,
          NTILE(@n) OVER (ORDER BY YearlyIncome ASC) AS NTileStd
       FROM 
          DimCustomer
    )
    SELECT
       COUNT(*),
       NTileStd
    FROM
       CTE
    GROUP BY
       NTileStd
    ORDER BY
       2;

    WITH CTE AS (
       SELECT 
          CustomerKey,
          CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
       FROM 
       DimCustomer
    )
    SELECT
       COUNT(*),
       NTileFast
    FROM
       CTE
    GROUP BY
       NTileFast
    ORDER BY
       2;

    You'll notice that the first will make 38490 I/O (!!!) where the second one will only make 1036 I/O, which is 37 time LESS!!!!

    I have discovered this behaviour with my collegue Marco Russo using a milions rows table and as you may image 37 times less I/O DOES the difference! :-)

    posted @ Friday, March 17, 2006 6:04 PM | Feedback (7)
    Thursday, January 19, 2006
    SQL Pass 2006 Europe

    I'm proud to announce that I will participate to the SQL Pass 2006 Europe conference as a speaker!

    Along with my collegue Alessandro I'll speak about advanced Reporting Services development. Here's the session abstract:

    "In this session the Reporting Services will be used beyond their visible limits, creating a solution that allows people to execute arbitrary T-SQL code to create their reports. This will be done using external assemblies, leveraging the power of .NET and the flexibility of Reporting Services."

    You can see all other sessions here:

    http://www.sqlpass.org/events/europe/2006/sessions.cfm

    See you there!

    posted @ Thursday, January 19, 2006 4:42 PM | Feedback (0)
    News