What's new in DTS 2005...that was the title of tonight's local SQL Server User Group meeting. We're a bit spoiled here in the Pacific Northwest, with our local SQL Server User Group meeting on the main Microsoft campus in Redmond. It's not too tough to get speakers straight from The Company. Tonight it was Donald Farmer, Group Program Manager, SQL Server - DTS. As he put it, the short answer to the question is: Everything.
Obviously I can't give you here in one blog post all that he covered in an hour-plus presentation (not to mention the demos) but I'll hit some of the points that stood out for me.
- DTS will be named something new in a couple of weeks. Donald didn't say what, and in fact said he didn't have any idea. That's probably something that's handled by the marketing group, or something like that. Just taking a stab in the dark, I would guess something like ETL Services. That seems to fit with the current mode of naming things related to SQL Server (Analysis Services, Reporting Services, ...). But that's just a WAG on my part.
- Once data is in a pipeline (dataflow) it is just a rowset and you can now use all rowset commands like merge joins (UNION ALL). You can also split data flows into different paths based on conditions, so this could allow you to split data based on certain criteria, massage each path differently, and merge it back together.
- The new DTS Designer, which is a Visual Studio IDE type environment called BI Studio, now has a debugger including stop points, and the ability to inspect variables, etc. There are also Data Viewers that let you see data points displayed in various ways (grid, histogram, scatter plot, etc.)
- For loops and For Each loops. Imagine “For Each file in this folder”. Or “For Each Database on this Server”. Or “For Each Table in this Database”. I assume, although he didn't say so, that these could be nested.
- Error Flows: What happens when an error occurs? Now you can, for example, redirect the row that generated the error into a separate flow. Maybe that flow just logs it into a table. Maybe it does some other processing and later merges it back into the main flow. It's up to you, but now errors don't have to just cause a task to fail, although that setting is still an option, too.
- Very fancy package configuration files where you can set a gazillion different settings including global variables or other content that would be dynamic between different servers such as connection strings.
- Checkpoint Restart: You can define checkpoints along the way, and if the package gets hung up for some reason, it can be restarted from the last checkpoint rather than all the way back from the beginning. Of course there are also transactions and something called sequences. I didn't get the full idea of those, but they sound like a way to group tasks so that they act like one transaction.
- Data Sampling Task: Set a percentage, such as 10% and the tasks after that work on just the sample of the data. Could be good to get a feel for validity of data before processing the whole set.
- Raw File Source: This is an output of data in DTS's “raw“ format. The demo showed extracting data from an original data source, which in theory could be a complex and time consuming process. Then there were two flows generated, one that stored everything in a Raw File Source, and the other that ran through a Data Sampling Task. Later, if the sample passes your standards, another portion of the package could then rapidly read the Raw File Source to retrieve the full data set for processing without having to do the costly extraction process again.
- There is a Package Migration Wizard to migrate DTS 2000 packages to DTS 2005, but as you would expect, anything that is really complex should be rewritten manually to use the most efficient features now available.
- Database Maintenance Plans are now DTS packages. I think this is a good thing. Maintenance Plans were a bit too much of a black box previously.
One other thing that Donald pointed out was that many DBAs use DTS not for moving data, but for establishing workflows. That makes sense for some maintenance things you'd want to do. I can see it even more with the new features. Also, Donald mentioned that some features such as Text Mining may be Enterprise level only, but they are looking to put a lot of power in the lower level versions. I think everything identified above will be standard.
I haven't done a lot of complex tasks in DTS up to this point, but damn this is looking good! Oh, and of course you can still do scripting tasks that now use VB.NET. Overall, I'd say they are on a good path to hit their goal of becoming truly an Enterprise-class ETL system, which is also why I think they'll pick the name above, to emphasize ETL.
posted @ Wednesday, September 08, 2004 11:14 PM