In-Depth

Build Your Own 'Skunk Works' Business Intelligence Project with SQL

Using SQL Server, you can build a BI platform without getting your organization's top brass involved.

If you're in the process of planning an enterprise-wide data warehouse, proper planning is a prerequisite. But what if you're just getting started and you've got a tight, almost non-existent budget? Or what if your department just needs a smaller, scaled-down business intelligence (BI) platform? Even with limited resources, you can still build one out that's fully functional. And, as a Microsoft shop, you're probably in a great position to start a skunk works BI project using the SQL Server Business Intelligence Toolkit.

But wait: Data warehouses are big-ticket enterprise initiatives, right? And don't they all require high levels of scalability? It's true that the reviewers often focus on data-warehouse tools in terms of scalability, and for large, enterprise-class data warehouses, that's critical. But with all of the discussion focused on the enterprise, the notion that a department might need -- yet alone benefit from -- BI has mostly gone away. The irony of this is that many, perhaps most, operational decisions are made at the divisional or departmental level using department-specific information that would never be included in an enterprise-wide data warehouse.

Individuals in these smaller groups need intelligence to manage day-to-day operations. Scalability may be desired, but ease of use, rapid deployment, low cost of ownership and tool integration are far more important in smaller settings. So, while it's debatable as to whether SQL Server can scale to the enterprise level, in the areas that matter most to divisions, the SQL Server Business Intelligence Toolkit is just what department-level IT professionals ordered.

As with any large, critical systems-development project, planning and coordination are paramount. We're not suggesting that those undertaking major BI projects skip those critical planning steps. The aim is to outline how division and department managers can create skunk works projects rapidly and easily with the integrated SQL Server Toolkit.

If you're wondering what a skunk works project is, it's a project that takes place below the corporate level and sometimes under the corporate radar (see "The Problem with Under-the-Radar Applications,"). The term originated at Lockheed Martin Corp. in the 1940s, according to the company's Web site, during design and production of a top-secret jet fighter. An engineer called the operation "skunk works" after a fictional comic-book forest, and the name stuck. Skunk Works is now a registered trademark of Lockheed Martin.

Tools and Licenses
Large or small, every BI project has a similar set of technology requirements: a database engine to hold the data warehouse or data mart; an extraction, transformation and loading (ETL) tool to move and clean the data; Web-based reporting tools to provide easy access to the business community; and, if possible, a multi-dimensional cube database that provides enhanced ability to analyze data. Because this is a skunk works project, we not only need all of these tools, we need ones that set up easily, are fully integrated with each other and are easy to use.

One of the criticisms directed at Microsoft is that SQL Server is so easy to use that IT pros can't take it seriously. Fortunately for our skunk works project, though, we can take full advantage of SQL Server's ease of use to get proof-of-concepts, prototypes and smaller projects up and running in short order.

When you think of SQL Server, you generally think of the relational database engine. It's the foundation of a BI project, holding all of the key business data in a data warehouse or data mart. But a BI project requires more than a relational data store, and that's where SQL Server's lesser-known siblings fill out the tools we need to complete our project.

You need to get data into a data warehouse, and the current ETL toolkit provided with SQL Server is the SQL Server Integrated Services (SSIS) toolkit. Having made its debut in 2005, it has proven to be a huge success, far outpacing its predecessor, Data Transformation Services (DTS). IT pros can design and prototype SSIS projects in short order, often in a matter of hours rather than days. And when the time comes to expand them from prototype to production-ready, it's possible to do that by extending -- rather than re-writing -- the SSIS package. SSIS includes a fairly solid set of tools to enhance error trapping, auditing and deployment.

With the data store and ETL tools in place, the next tool we'll need is a Web-based reporting package. For our initial skunk works project, all we want is a simple Web portal. But longer-term, as the project moves from proof-of-concept to a must-have, we'd like the ability to incorporate our Web reports into a fully customized Web site or into the company's SharePoint portal. Fortunately for us, one of the SQL Server components includes just such a reporting tool that can grow from a skunk works project to a full production platform.


[Click on image for larger view.]

SQL Server Reporting Services (SSRS) has evolved since it first came out as an add-on to SQL Server 2000. Detractors will say it's not as good as (fill in some enterprise reporting tool), but they're missing the point. We're focused on rapidly building a skunk works project, and in that respect, SSRS is first class. The interface is logical, separating data sourcing, report layout and review. And, as is expected, our reports can be parameterized, allowing users to control how and what they see in them. An unheralded but powerful feature is the capability to drive reports off of parameterized stored procedures.

Like SSIS, SSRS is fully integrated into Visual Studio, making for an easy transition for existing Visual Studio developers. SSRS provides for all of the reporting basics, such as Web access, user and group security, export capabilities to Excel and PDF, report caching and notifications. For our skunk works project, the default SSRS report Web site is sufficient. However, as user demand grows from curiosity to necessity, SSRS can grow with our needs. One of the benefits of SSRS being integrated into Visual Studio is that we have the ability to create a fully customized Web interface if we find that the default format is insufficient. Alternatively, we can integrate our reports into SharePoint.

Finally, if your skunk works project demands a dimensional data store, SQL Server has that as well. SQL Server Analysis Services (SSAS) first appeared under a different name with the release of SQL Server 7 back in 1999. It had limitations, but offered cube-based, dimensional data stores to the masses. As part of the SQL Server family, SSAS is fully integrated with all of the other tools, providing for fast, efficient design, development and deployment of BI projects.

With the tools identified, licenses are the next hurdle in our process of building, deploying and supporting a BI platform. If you're a Microsoft shop, you may already have some or all of the licenses you need, and the licenses you don't have may not cost all that much. For costs and licenses, you've got several areas to consider. We'll need developer licenses, user or client-access licenses (CALs) and production-server licenses.

For developer licenses, Microsoft has gone out of its way to make getting a licensed SQL Server developer seat inexpensive. A standalone full SQL Server development license costs $50. If you already have one of the MSDN licenses, it likely includes access to the full set of SQL Server dev tools.

For CALs, you may also already be covered. If you have an enterprise agreement with Microsoft for your desktops, it's not uncommon for each desktop license to already include a SQL Server CAL for every person covered under the agreement. Because BI projects are internally focused, there's a good chance that the CALs you need are already in place. Just be sure to confirm the license arrangements with your Microsoft contract manager.

With the development and CAL licenses in place, all you need for final production is one or more server licenses. If your CALs are set, you just need a server license. This is a skunk works project, so stay with the lower-cost Standard Edition unless you really need the Enterprise version options.


[Click on image for larger view.]

Business Focus
As with any development project, getting a well-defined scope is critical to success. Skunk works projects are no different, but it's imperative that we start the project with a limited focus aligned to a business need close at hand.

If a project doesn't come to mind quickly, brainstorm with a few close associates in your group. If you still don't have a business need that can benefit from improved access to information, consider expanding and working with a close business associate in another group. Just be sure to keep a clear understanding of expectations. As you push out from your group, tolerance of blemishes decreases exponentially.

Data
Data is at the heart of BI, and using information to make better decisions is good business. So, is BI just the latest name for reporting? The official answer is no, but the two do have a lot in common. Reporting lets IT departments collect, organize and provide access to information, but BI systems take it up a notch. The main point is that we have a business to run: We have information in all corners, and the better we can collect, organize and use data, the better we can make decisions. So, is that reporting? Or is that BI? Does it really matter?

A well-known but little-discussed topic within the BI community is that moving, cleaning and otherwise preparing data for use within a data warehouse takes up as much as 90 percent of the work in a BI project. For our skunk works project, we want to keep that work to a minimum.

For large, enterprise-scale BI projects, matching, migrating and transforming data from multiple data sources is paramount. If we're lucky, data is just in different data-type formats and is mostly complete. But more often in large enterprise projects, the business information is not only structurally different, it's conceptually different.

Consider this example: The sales department splits New York between the Northeast Region and the Mid-Atlantic Region, using telephone area and exchange codes. Using postal codes, the marketing department splits New York between the Greater New York City Metropolitan Area and the Central Atlantic Area. If you don't already have a mapping between postal codes and telephone area and exchange codes, you might want to get started. For our skunk works project, these are the types of transformations we really want to hold off on until later phases. So, in addition to keeping the scope limited and close to our group, we're going to work hard to use data sources that require only minimal cleaning and transformation.

Undoubtedly we'll have to do some data cleaning, transformation and organization, but we'll follow a few basic rules to keep this to a minimum. First off, start with single sources of data to initially minimize data-matching nightmares. Second, work with complete or substantially complete data sets. Finally, stick with transformation logic that's easy to follow and easy to explain.

Design and Planning
One of the benefits of skunk works projects is the ability to be agile and keep documentation to a minimum. We still need documentation that falls into a few categories. We'll need some high-level design layouts, process-flow documents and basis-of-decision documents.

We're all familiar with design meetings using whiteboards, diagrams and other visual tools. And, if you're a Unified Modeling Language shop, use-case diagrams work well. No matter where the process design starts, in the end, the code and the SSIS packages represent the final state of the process and serve as a good place to document the "what" and the "how" of the project. What often gets lost is the "why" or the "basis of decision." Regardless of where you document the design and the process, be sure to document why the system works the way it does and why transformations are being made.

Waterfall, Spiral and Skunk Works
Today's BI projects usually follow a formal, defined process, reminiscent of the waterfall approaches used for legacy software projects. For large, enterprise BI projects, it's reasonable to say that the design rigors of a modified waterfall approach could actually lower the total cost of the ETL process. For our project, though, we've made business and design approaches that help us minimize the need for full-scale waterfall planning.

We've kept the business focus tight, managed our scope and put ourselves in a position to focus on the problem at hand, with only a minimal amount of documentation and bureaucracy involved. The other key advantage we have going for us is that we're using the SQL Server Business Intelligence Toolkit. As with any construction project, the right tools are paramount. Ease of use and tool integration let us use an agile approach to designing, building and deploying our project.

The ETL Black Hole
ETL accounts for a huge cost in BI initiatives -- not just in initial design but also with ongoing data maintenance. Imagine a situation in which it's necessary to implement complicated transformations in order to merge two very different data-source systems in the pursuit of the "one truth." This is great, but what happens when one or more executives decide to contest the facts from the data warehouse? Or what if their source systems are in disagreement? If the transformations are straightforward, tracing and documenting the variance is simple. But if complicated transformations are not well-documented and traceable, the credibility of the system and the data starts to come into question. You'll need a full audit trail, and you need one that can detail every decision made during the transformation.

Because we've narrowed our skunk works project's focus and data sets, we're set to avoid most of the potential ETL challenges found on a larger BI project. But for ETL, that's only the data flow component. Full-scale BI systems have fully automated ETL processes with the requisite exception handling. With our skunk works-project mindset, we're going to flip this around. Full automation and exception handling will have to wait. Once the project gains acceptance, we can return to extend and enhance our initial ETL processes. SSIS makes this easy.

Data-warehouse projects under development today are targeting real-time or near-real- time data repositories. With the Internet, 24x7 operation is becoming the norm, not the exception. A few years ago, daily refreshes were typical, and before that monthly refreshes were all that were required. Skunk works and prototype projects don't require a fast-frequency data refresh, at least not initially. As part of our agile approach, the initial focus is on organizing and presenting the data. If the project progresses well, we can revisit our ETL processing cycle. Real time may be part of the scope in a future revision, but for our initial skunk works project, our refresh frequency is ad hoc.


[Click on image for larger view.]

Hardware
One of the latest trends in data warehousing is the advent of specialized, big-ticket database hardware platforms, also known as data-warehouse appliances. Bring up a discussion on data-warehouse servers, and the expectation is that -- just to get started -- you need a six- or seven-figure budget just for the hardware. That's true if you're looking at a large, enterprise-scale system. But we're focused on an operational division- or departmental-level platform. To get started, all we need is a modest server -- and it doesn't have to be current. Sure, the latest and greatest hardware would be great, but this is a skunk works project focused on speed and agility.

Fortunately, just about every company I've worked with had at least one room filled with cast-off equipment.

Getting permission to use this stuff has never been difficult. Most likely, it's all been written off and is just sitting there until someone has time to get rid of it. For our initial project, that's all we need to get started. But can we use old equipment? Absolutely. If we can get started with existing cast-off equipment, we get more benefit than you might realize. If we take a thrifty approach, there will be little pushback and some potential upside when senior management gets wind of the project. If management looks upon the project favorably, we'll get points for recycling equipment. If the project doesn't work out well, no budget resources went toward new hardware. When the project moves forward and starts to gain business users and support, that's the time to do the full hardware analysis. At that point, you have a proven tool, and you can ask for the really good stuff.

With virtual servers gaining ground, virtualizing your systems offers another option for implementing your data-warehouse platform. If your development environment is set up to use virtual servers, getting a new virtual server set up may be a painless endeavor. It can offer an affordable option for creating a scalable solution. Also, consider grabbing some older hardware and setting up your own virtual environment. The main point is: Don't let hardware be an impediment to moving forward on your skunk works project.

Staffing, Training and Resources
When it's time to select the team, keep it small and informal. The ideal team has three team members or fewer. Once you get more than three people, skunk works projects start to take on a more formal approach, which is not what you want at this time. A good team is self-selecting, with staff members who have expressed interest in BI and SQL Server tools. Alternatively, you could hire a consultant. If you do decide to bring one in, keep the emphasis on jump-starting the project and transferring knowledge.

For a skunk works project, you want consultants who are self-starters -- and self-starters are not the type to sit in a classroom. In their minds, the skunk works project is the training -- but they may still need some kind of introduction to the tools, and Microsoft has already provided for this need. They don't get much attention, but the tutorials Microsoft provides for the SQL Server Business Intelligence Toolkit are the best place to start. The tutorials address all of the fundamentals for a successful BI project.

One of the thought leaders in BI and data warehouses is Ralph Kimball. Fortunately, two members of his staff, Joy Mundy and Warren Thornthwaite, have written an excellent book on how to build a data warehouse using SQL Server. "The Microsoft Data Warehouse Toolkit" (Wiley, 2006), as the title implies, focuses exclusively on designing and building data warehouses with SQL Server BI tools. If you can get only one book, this is the one to get.

With the SQL Server Business Intelligence Toolkit it has never been easier to get a BI project off the ground. Skunk works can offer a rapid, agile approach to building and evolving functional BI platforms that solve real business-information needs.