Wednesday, August 06, 2008
#
My friends over at SQL Server Magazine posted this today...SQL Server 2008 RTM!
http://www.sqlmag.com/Article/ArticleID/99953/99953.html
Microsoft continues to up the 'ante' in the Business Intelligence space, cheers.
Update:
Official Press Release:
http://www.microsoft.com/Presspass/press/2008/aug08/08-06SQLServer2008PR.mspx
Friday, August 01, 2008
#
One of my favorite things to do each month is write SQL Server Magazine's Essential BI column. I write two posts for the column each month. About a month ago I started to focus on SharePoint for Business Intelligence (BI) topics. For those of you who don't know, SharePoint is Microsoft's BI portal platform.
If you have ever seen a Microsoft BI slide deck then you probably know of the ‘hamburger slide’. The ‘hamburger slide’ is a common slide which shows that SharePoint is the outer-most layer in the Microsoft BI stack. Below is the slide, notice how SharePoint Server is at the top with its various BI capabilities in the tabs:

My last two posts for the Essential BI column has been on SharePoint for BI topics. First, I wrote about SharePoint's overall key BI features including integration points with the other pieces of Microsoft BI software. Next, I created a piece on how you can use Windows SharePoint Services (WSS) 3.0 for BI. A lot of companies have access to WSS 3.0 via Windows Server licenses but do not have the larger Microsoft Office SharePoint Portal Server (MOSS) edition.
From here I have some additional topics, the first of which I have just begun to write:
· MOSS BI: Data Connection Libraries
· MOSS BI: Native KPIs/Dashboards/Filter WebParts
· MOSS BI: Enterprise Search for BI/Business Data Catalog (BDC)
*Excel Services was recently written about quite well by Paul Vaughn in this month’s edition of SQL Server Magazine (August 2008).
Where would like to see additional content on SharePoint BI? What SharePoint BI topics are important to you that I have not yet covered? Send me an email at dcomingore@scalabilityexperts.com or feel free to leave a comment on this blog.
Monday, July 21, 2008
#
A common question I hear from my clients is “how much storage do we need?” Now wait a second, I thought storage was cheap these days right? Wrong, consumer storage is cheap; enterprise storage is still not what I would classify as cheap. Add to this fact that Enterprise Data Warehouses (EDWs) commonly break into the several terabytes range and the matter of planning storage capacities becomes greater. Consulting firms that simply ignore this common client question are missing a value-add scenario. Your clients need to know how much space to plan for and if you simply say “roughly X TB” you are creating two net results. The first is that the client may purchase too much storage or may not purchase enough (very bad). Second, the client will have no idea as to how much storage to plan for in the coming years. Do both yourself and the client a favor; research the storage requirements of the BI solution you have proposed!
Estimating Data Warehouse Storage
To estimate a data warehouse’s storage requirements have a dimensional model designed in any number of tools including Visio, Erwin, or Excel. The main point is to have the actual data types required and their storage affecting settings such as variable length string columns designated. When estimating your data warehouse storage requirements make sure you assume the data types have their max allowed values being used for every record. The thinking is that it’s better to overestimate then under when it comes to storage (to a degree).
Now, obtain from the client how many records they believe will be processed and stored in the data warehoused based upon business requirements. Note how the storage requirements of fact tables are directly related to their grain definition. If your fact tables must store data by day you will certainly require more storage than if you only need to store the various measures by week. This is why it’s very important to have a solid dimensional model phase completed prior to entering other data warehousing tasks such as storage estimates. At this point you should instantiate the dimensional model in a SQL Server relational database.
Populate your schema with a sample percentage of the client’s estimates. For example, if the client has said that you can expect one million records per month and you must retain three years worth of data then you can expect to hold a total of around 36 million records. Based upon the estimates populate your fact tables with 1, 5, 10, or even 25 percent of the total records expected to be retained. Rebuild your fact table’s indexes. Finally, obtain the data and index spaced used by your fact tables and multiply those values according to the percentage of sampling you used.
Estimating Analysis Services Cube Storage
Clients not only need to know how much storage to obtain for the relational dimensional model but also the cubes (multidimensional objects). Estimating cube storage requirements is a more difficult task; however the same basic logic applies. Design and build your sample cubes as close as you can to how they will be built in production. The most important settings for estimating cube storage are the partition aggregations & storage mode settings. In a development environment you will probably only have a single partition, so there is an error of margin when estimating cube storage requirements since you will likely have multiple partitions with various storage settings in a production environment. Finally, perform a full process on the Analysis Services database once you are done creating the sample cubes and their related settings.
ROLAP & HOLAP w/0% Aggregations incur zero storage overhead excluding the cube’s metadata. If you apply aggregations to the HOLAP storage mode the aggregations will occur in the partition files (cube), thus consuming space. If you apply aggregations to the ROLAP storage mode, indexed views are created (and populated) in the underlying relational database (which means ROLAP with aggregations does in fact incur some storage overhead). MOLAP stores both grain data as well as aggregations in the partition files (cube). Finally, remote partitions can be used for placing less frequently access data onto cheaper disks.
With your sample cubes built and processed you can begin the cube estimation process. Obtain the sample cubes’ partition(s) estimated size in Management studio. Next, multiply the partition storage estimates according to the percentage of sampling you used (just like we did in the fact table estimates). Note that this method assumes both the storage mode and the aggregations are consistent throughout each cube (which is fairly unlikely in production environments). I spoke about the storage mode margin of error earlier but I should also mention that the % aggregations effect cube partition storage estimates as well. In the real world aggregation designs change, however for estimation purposes it’s easiest to assume a static setting of X%.
By using some variation of the methods mentioned above you will have a solid foundation for providing accurate estimates of storage required for both dimensional data warehouses as well as their associated cubes.
Monday, July 07, 2008
#
As a follow up to some recent feedback (actually a question) here is some more in depth information on becoming proficient in the Microsoft Data Warehousing/Business Intelligence stack.
Get the terms straight
Business Intelligence or BI is an umbrella term for various business processes, tools, and methodologies which support improved organizational decision making. Data Warehousing (and thus data warehouses) on the other hand are organizational repositories designed to facilitate highly scalable reporting & analytics. DW and BI go together like PB & J (ah the good ol' days). So is Data Warehousing synonymous with BI...NO! Data Warehousing is a common solution (due to its many advantages) implemented to enable BI in an organization....so when you see DW/BI you will now understand that a) DW/BI doesn’t mean they represent the same thing and b) that DW and BI merely complement each other very well thus the term's existence.
Now, understand before ever cracking open a BI book (or scouring the web all day for free information, who would do such a thing?) that REPORTING <> BI. The BI industry is plagued with terminology problems! So, DW <> BI and Reporting <> BI. And like DW, Reporting is merely one component of the larger field of BI.
Personal Microsoft Business Intelligence Professional Learning Plan
Understand before I dive into this that the words to follow are from a DW/BI architect/engineer and not a fulltime trainer. Actually, I wrote a seperate post on the pros/cons of fulltime trainers prior on this blog here. So with that disclaimer out of the way here are the topics in chronological order I would propose to anyone wishing to learn how to implement DW/BI systems on the Microsoft platform.
- Foundations (some of these topics overlap to various degrees)
- Data Warehousing (DW) Fundamentals (Normalized DW Vs Denormalized DW)
- Extract, Transform, and Load (ETL) Fundamentals
- Dimensional Modeling Fundamentals
- OnLine Analytical Processing (OLAP) Fundamentals
- Data Mining Fundamentals (I have more to learn here as well...in fact most do)
- Brief History of DSS (decision support systems) *You could argue you don't need this knowledge and in so far as getting the "job done" you don't.
- DW/BI Methodologies
- Kimball (www.kimballgroup.com) is pretty much the defacto standard for HOW to build an enterprise data warehouse on the Microsoft Platform. I personally have not read others work but do know of them and by all means use what ever methodology fits you and your DW/BI projects best.
- Microsoft DW/BI Tools (now to the technology)
- SQL Server 2005 (including Analysis/Reporting/Integration Services)
- Microsoft Office 2007 Suite (namely SharePoint, PerformancePoint & Excel)
Best "WWW Stops" (in no particular order)
What you see is what you get
So thats my list (and best "stops") for now. Those of you new to the Microsoft platform may laugh that there are only a handful of products to learn but be forewarned that both SQL Server and the Office products are VERY LARGE offerings. SharePoint & SQL Server in particular can (and often do) fill bookshelves worth of valuable content.
I hope that helps everyone out there who is considering learning more and/or making a career in the MSFT DW/BI space. There is a lot to learn, however you will find that having a solid background in databases (and a bit of programming) goes a long way in learning the actual tools. It also seems to be true according to the TDWI (www.tdwi.org) that most DW/BI Professionals in general (not just those using the Microsoft platform) have lived a "past life" in another part of IT (usually dev/dba). Most people do not go directly into the field of DW/BI (which I personally 100% support)!
Thursday, July 03, 2008
#
Hope everyone is doing well out there. A good friend of mine across the pond mentioned that he has started to read a lot of my content across the blogs, my SQL Magazine Articles/BI Column, etc. His feedback was excellent and relates to a general perception issue with the field of BI. "They are highly technology based" was his comment in so many words...
First a few clarifications are in order, writing for SQL Magazine means you will be discussing the actual data platform called SQL Server at some length. This blog and other places though I could write more BI neurtral content. And thus I have created a new post category called "Business Intelligence Nuggets" (which this post is the first in).
The perception issue is that Business Intelligence is all about technology! Business Intelligence is 100% about business, THE TOOLS ARE SIMPLY ENABLERS. A well thought out dimensional model will work well on any database platform. A solid ETL design will run great on any ETL platform it is implemented in. The larger point is, a great BI architect will be good (and thus his designs) on any platform or toolset.
There are layers to any BI solution nomatter how or what it is constructed of. First is the Business and it's logic/requirements. Second, is design and methodologies...the words Kimball & Inmon come to mind. Third, are the tools...in my case SQL Server, Analysis Services, Reporting Services, and Integration Services.
Tools (SQL Server etc.)
Design (Design Patterns, Best Practices, & Methodologies)
Business Requirements (What needs to be measured?)
Sunday, June 15, 2008
#

Well, the new SQL Server 2008 build Release Candidate 0 (RC0) has been available for a bit now and I have had some time with the new build. The word "build" brings up a good point actually, before we go any further I want to review the last few SQL Server 2008 version numbers. You can retrieve your SQL Server 2008's database instance's version number using either @@Version or the serverproperty function, for more information see one of Euan's old posts here .The last few SQL Server 2008 versions are as follows:
- November 2007 CTP: 10.0.1075.23
- February 2008 CTP: 10.0.1300.13
- June 2008 RC0: 10.0.1442.32
So what is brand new in the RC0 release? Well, thus far I am not finding a lot but I believe this is because the product is now in RC status and thus much closer to the ship date! However, there are some new items I have come across and there are probably more. If you happen to know of a SQL Server 2008 RC0 feature that is not in my list please comment the post and I will add it.
*SQL Server 2008 Books Online (BOL) RC0 unfortunately is not explicit about what the new features are in the RC0 release. There were some components in the past SQL Server 2008 CTPs that were documented in this manner and I wish BOL would remain consistent in that documentation convention while the products are in CTP/RC status.
- Marketing
- We have a new logo for the 2008 product (see above), you can find more background information here.
- Tools
- In Management Studio, when we connect to ANY SQL Server server we are given an option to specify additional connection string parameters.
- Enhanced installation routine which also requires a new version of Windows Installer
- Enhanced installation Center utility
- New SSIS Connections Project which serves as an alternative to the Import/Export Wizard
- Report Designer Preview morphed into what is now known as Report Builder 2.0 and is part of the accompanying Feature Pack.
- New Activity Monitor (per Graz on SQLTeam, thanks!)
3. Samples
- Updated RC0 Sample Databases (www.codeplex.com)

Tuesday, June 10, 2008
#
Friday, June 06, 2008
#
With the impending arrival of Microsoft
SQL Server 2008 come several new enhancements. These enhancements can be categorized as new capabilities, old constraints removed, and improvements in scalability. In the SQL Server community (including this blog) you can find numerous amounts of information on the new features found in
SQL Server 2008…however you seldom see much on deprecated features. Deprecated features are abilities of the new release which are either being removed completely or are being replaced by a future build of the software product. Discontinued features on the other hand are software features removed in the new release of the product.
Before I dive into the ‘list’ I wanted to mention that
SQL Server 2008 includes a performance counter which will increment each time you use a deprecated feature. The counter is called
SQL Server:Deprecated Features. Plan accordingly prior to upgrading to
SQL Server 2008 and of course leverage the SQL Server Upgrade Advisor which is bundled with the product as well as Scalability Expert’s SQL Server Upgrade Assistant which can be found at
http://www.scalabilityexperts.com/default.asp?action=article&ID=43 .
Here is the deprecated & discontinued list by component, MSDN URLs:
SQL Server 2008 Reporting Services Discontinued Features
Tuesday, May 27, 2008
#

Before I get into this new
SQL Server 2008 feature I want to briefly review why we have a new command-line in the Windows platform…In the current Windows ‘landscape’ we have a variety of technologies that can be used to programmatically administer various Windows data stores:
1. MS-DOS and Batch Files
2. Windows Script Host
3. VBScript/Jscript & COM Object Models
4. CLR Languages & Managed Object Models
This current Windows programmatic administrative environment is a mess to say the least and when you need to incorporate multiple data stores into a single ‘script’ it can be more challenging. What is needed is a single environment for administrators to easily manage and manipulate the various data stores.
Enter Windows PSH…with PSH we have an extensible, unified, command-line environment for programmatic administration of the Windows platform! Through the development and use of PSH ‘Providers’, both Microsoft and the larger community can continually adapt PSH to more repositories on the Windows platform! And like most new Windows components, the new command-line is built on top of the .Net Framework and the underlying CLR.
I will not delve into the details of PSH, please see the Resources section of this blog post to find the product specific documentation. You should become familiar with concepts such as cmdlets, providers, and the associated PSH scripting language. There is plenty of good content already out there on these topics.
I read somewhere (on a blog), someone labeled the SQL Server 2008 PSH Integration as a ‘sleeper feature’ and I have to agree. While in the PSH marketing material, the technology is being touted as one of the quickest adopted technologies, in the real-world I’ve yet to come across its real-world use. I am sure there are many organizations currently using PSH; however I do doubt the current marketing statements about its overall community adoption. Very few folks I work with know anything beyond the product’s name, little alone what SQL Server 2008 provides in the way of PSH and thus this blog post.
So what does SQL Server 2008 provide us with in the way of ‘PSH Integration’? Plenty (for a V1 feature)…
1. Full Relational Engine & Policy Navigation & Manipulation (this is done via the SMO/Policy Managed classes and is implemented via PSH providers)
2. Object Explorer in Management Studio allows you to start a new PSH session using the current context of the relational engine hierarchy.
3. A new SQL Server Agent PSH native job step
4. A new cmdlet for invoking TSQL or XQuery statements
So what does the new feature not provide…Replication/SSIS/SSAS Navigation/Manipulation! There is however a community built SSAS 2005|2008 PSH provider, the link can be found in the resources section of this post. Essentially, we have functionality that is exposed via the SMO and new policy classes and thus functionality entailed in RMO, the DTS/SSIS managed classes, and the AMO object model are not yet exposed via a PSH provider (at least not out-of-the-box).
PSH & SQL Server 2008 PSH Integration Resources:
Tuesday, May 20, 2008
#
Finally, we have a Data Profiling Utility w/SQL Server !!!
With SQL Server Integration Services (SSIS) 2008 you will have access to the new Data Profiler task. The SSIS 2008 Data Profiler Task is an easy and effective method to profile your target source systems (tables & views) prior to building ETL solutions consuming their data. With the Data Profiler you can perform a variety of available 'profiles' against a SQL Server 2000 or later database using an ADO.Net connection. The output of the choosen profiles can be saved to an external XML file for viewing using a new standalone utility called the Data Profiler Viewer (C:\Program Files\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe) or to an SSIS variable. If you wish to manually review the profile results then you will want to save the output to an XML file using a file connection. If you intend to build programmatic control flow based upon the results of the profiles then you will need to save the results to a variable and then poll the contents of that variable.
Available Profiles & Profile Types
There are two types of profiles: single-column and multi-column|multi-table.
Single-Column Profiles:
- Column Length Distribution
- Column Pattern
- Column NULL Ratio
- Column Statistics
- Column Value Distribution
Multi-Column|Table Profiles:
- Candidate Key
- Functional Dependency
- Value Inclusion
Configuring the Data Profiler Task
To configure the Data Profiler is actually pretty simple. First, create a new ADO.Net connection manager to a SQL Server 2000+ instance. Next, if you wish to save your results to an XML file you will need to create a new File connection manager as well. Once you have your connection manager created go ahead and drag-and-drop an instance of the Data Profiler Task onto the control flow designer surface within a SSIS package using Business Intelligence Development Studio (BIDS). Right-click the task and select the Edit context menu option. On the General page you will designate the profile output and can optionally elect 'Quick Profiles' via clicking the button. Quick profiles allow you to select a variety of profiles using default options against the same table or view.

Next, in the Profile Request page you can configure individual profiles and their corresponding settings. These settings are fairly self explanatory, note that each profile has it's own unique settings which are displayed in the pane labeled Request Properties. Each row represents a new profile to be processed.

Finally, in the Expressions page you can optionally assign run-time formulas for any of the supported task properties.
Conclusion & Shortcomings
For a 'first stab' at a data profiling solution, the Data Profiler Task is pretty good. With the new task you can fairly quickly analyze various data quality and metadata information about your client's source systems. While I do praise the SSIS team (and Microsoft Research) for their efforts there are a few shortcomings you will quickly notice:
- Only supports ADO.Net & SQL Server 2000+ later
- No option to extend or create new profiles