Everyman's Guide to Data Warehousing

Essential questions to be answered before going ahead with a data warehouse project.

You have read the articles, attended all of the conferences, and are now charged to start your own data warehousing project and attain that 401% ROI you have heard everyone is getting. In addition, you have gone to the CFO or the vice president of marketing and gained funding for your enterprise data warehouse. Congratulations ... now it is time for the real work to begin.

And you had better be successful, because this project will not be cheap and all eyes will be upon it. You will have to evaluate and purchase Extraction, Transformation and Load (ETL) software, OLAP software, all forms of hardware, middleware, and hire expensive consultants. And if that is not difficult enough, an enterprise data warehouse also involves integrating data from a corporate perspective. Therefore, you will have to interact with the most senior people in each of your company's divisions and guide them to consensus. And if you do not deliver on that 401% ROI, and appease all of those division heads, your supervisor just might hand you a one-way ticket to the company's Siberian office.

But before your decision support system (DSS) project even begins, take a moment to examine these essential questions. If you answer "No" or "I'm not sure" to any of the following, you might want to do some more homework before heading down the data warehousing trail.

1. What are the specific strategic business objectives (drivers) that the warehouse is suppose to achieve?
The overriding reason many decision support projects fail is not that the projects were technically unfeasible. On the contrary, many of the technological challenges of data warehousing have proven answers. The most common cause of failure is that the warehouse did not meet the business objectives of the organization. I have seen many companies do a fine job of technically building a data warehouse; unfortunately, the system did not solve any of their business needs. Warehouses that do not satisfy business user's needs are not accessed and eventually die.

2. What are the specific, calculable measurements that will be used to evaluate the ROI of the DSS system in meeting your company's business objectives?
Clear business objectives are measurable. This activity is critical, because once the data warehouse project is completed, the management team will have to justify the expenditure. Moreover, it is important to understand that a data warehouse is not a project, it is a process. Data warehouses are organic in nature. They grow very fast and in directions you never anticipated. Most warehouses double in size and in the number of users in their first year of production. Once a cost justification can be quantified for the initial release, the process for gaining funding for follow-up releases is greatly simplified.

3. Are the key users of the data warehouse identified and committed to the success of the project?
The users always dictate the success or failure of the warehouse. They therefore need to be heavily involved throughout the data warehousing project. Taking it a step further, users need to have a personal stake in the success of the project. It is amazing how quickly problems vanish when everyone has a vested interest in the project. Also, make it a point to educate users on the fundamentals and processes of data warehousing. Teach them its benefits, as well as its limitations. This will aid significantly in managing their expectations. A good rule of thumb is that if you have gone more than two weeks without talking to your users, then it is time to set up a meeting. Keep in mind that many times these people are the ones picking up the tab on these projects.

4. Is the organization trying to build a +1 Terabyte, "do-all, be-all" data warehouse on its first iteration?
Data warehousing projects stretch an organization in ways unlike operational systems projects do. From a political perspective, an enterprise data warehouse requires consent and commitment from all of the key departments within a corporation. In addition, the learning curve of a decision support project team is seldom understood or planned for. There will be a new and dizzying array of software tools (ETL, OLAP, portal, meta data, data cleansing and data mining) that will require tool-specific training.

By adding massive amounts of data into the equation the points of failure increase significantly. Moreover, large volumes of data will push the envelope of the RDBMS, middleware and hardware, and could force developers into using parallel development techniques if a massively parallel processing (MPP) architecture is needed. Keep in mind that the answer to many of these challenges comes in the form of a hefty price tag. As a result, adding the dimension of size is just too painful and costly for most enterprises to attempt during the first iteration.

Data warehouses are best built in an iterative fashion. Do not misunderstand; this is not to recommend that a company should not build a fully functional, multiple Terabyte, +7 subject area, Web-enabled, end-to-end, enterprise data warehouse with a complete meta data interface. It simply means that the highest probability for success comes from implementing a decision support system in a phased approach. By using the first iteration as an opportunity to train the corporation, it will set the stage for bigger and better future implementations.

5. Does the decision support pro-ject have support from executive management?
Any large-scale project, whether it is a data warehouse or implementing that hot new customer relationship management (CRM) system, needs executive management on board. Moreover, their involvement is imperative in breaking down the barriers and "ivory towers" in companies. Their position allows them to rally the various departments within a corporation behind the project. Any substantial project lacking executive management participation has a high probability of failure.

6. Does the organization have a clear understanding of the concepts and tools involved in data warehousing?
If you do not have a data warehouse built, then the answer to this question will most likely be "No." As a result, training and education will be required. But keep in mind that training is required at many levels. Initial education is necessary to convey the concepts of a data warehouse, data mart, operational data store, star schema design and meta data. Data acquisition developers will also probably need to be trained on a transformation tool (Informix DataStage, for example), while data warehouse access developers will require significant training in an OLAP tool (for example, BusinessObjects and Cognos PowerPlay). Data administration developers will need to be trained on a tool that will integrate all of the company's meta data into one repository (Platinum Repository, for example). And, more than likely, there will be a Web component used to access the data warehouse and the meta data repository. Depending on your organization, additional training and outside consulting could be needed for each of these areas. Keep in mind that these are only the data warehousing-specific training issues. There still needs to be an understanding of the hardware, middleware, desktop, RDBMS and coding language (COBOL, C++ and so on) of the transformation tool.

7. Is there a highly experienced project manager and data warehouse architect, with experience in building warehouses, who will actively participate throughout the project?
Data warehousing projects are fundamentally different from operational projects. Operational projects are necessary in order to operate the day-to-day business of the company. Decision support projects are critical for making strategic decisions about your organization. In addition, data warehouses grow at an alarming rate during the first few years of production. An experienced data warehouse project leader understands these facts and keeps the vision of the project in concert with the real-world reality of decision support. In addition, the data warehouse architect must design a scalable, robust and maintainable architecture that can accommodate the expanding and changing decision support requirements.

These fundamental challenges require highly experienced, senior-level individuals. These positions can be filled via in-house resources or by consultants. If consultants are used to fill these roles, it is imperative that they are highly skilled at knowledge transfer; in-house employees should be assigned to shadow consultants for both of these roles.

8. Has an experienced consultant been brought in to do a readiness assessment of the organization?
This step is very important because an experienced hand can identify problem areas in the organization that can be dealt with early in the DSS project's life cycle. But identifying that person is another issue. Be wary of consultants without real-world, hands-on experience. It is one thing to be able to write or speak about data warehousing; it is another matter entirely to have the experience needed to navigate through the political quagmires and to know what it takes to physically build a data warehouse.

About the Author

David Marco is the author of Building and Managing the Meta Data Repository: A Full Life-Cycle Guide from John Wiley & Sons. He is founder and president of Enterprise Warehousing Solutions Inc. (EWS), a Chicago-based system integrator. He can be reached at 708-233-6330 or via E-mail at [email protected].