In-Depth

Book excerpt: The business case for accurate data

This article is excerpted from Chapter 6 of Data Quality: The Accuracy Dimension by Jack E. Olson. Used with the permission of the author and Morgan Kaufmann Publishers.


It has historically been difficult for data quality advocates to get corporations to make significant commitments and spend adequate money to create and maintain an effective data quality assurance function. The business case has always been evident, but quantifying it, determining the opportunity for improvements, and comparing it to other corporate opportunities has always been difficult to do and often has been unsuccessful.

Data quality has too often been one of those topics that everyone knows is important but just does not generate enough enthusiasm to get something done about it. Most CIOs will tell you they should be doing more. However, they then prioritize it down the list and give it little support.

There are several reasons for the difficulties that have plagued data quality efforts. It is a tough topic to build a business case for. Many of the factors are purely speculative unless the work to correct them has already been done. This means you have to spend the money to find out if it was worthwhile.

This article outlines some of the factors and thought processes that come into play in trying to get commitments and funding for data quality activities. Each company addresses funding in its own unique way. It is not possible to lay out a justification path that works for all of them. It is helpful to understand the nature of the beast in order to build a workable business case strategy appropriate to your own corporation.


The value of accurate data

You cannot place a direct value on accurate data. It is the cost of inaccurate data that builds the business case. It is much like advertising. You do not know what impact it will have on sales and you do not know what impact it has had on sales, but you do know that if you don’t do it you lose a lot of sales. Inaccurate data can cost corporations millions of dollars per year; it can cause corporations to go out of business. If you have perfectly accurate data, you do not know what you saved. You only know that if you do not have it the costs will be significant.

Firms have clearly been willing to accept poor data quality. Its prevalence across almost all companies attests that corporations can survive and flourish with poor data quality. It is not considered a life-and-death issue, even though for some companies it can be. It is an issue of improving the efficiency and effectiveness of the corporation to make them better able to compete and better able to survive tough times. However, most executives would not believe that data quality issues would be a serious threat to their corporation compared to other issues that are more obviously real and serious threats.

Data quality is a maintenance function. Like HR, accounting, payroll, and facilities management, information systems provide a basic function that is a must for an effective organization to do its business. The efficiency with which you perform maintenance functions determines how effective you can be at your primary business functions. Information systems are moving up the ladder in importance and are a critical business component today. Data quality is the issue within information systems that can make them effective or make them a hindrance to getting things done. It is a sound management practice to want the best and most efficient information management systems.

The money spent on data quality activities is aimed at eliminating costs currently being incurred by the corporation and removing the risk of costs that can potentially occur due to inaccurate data. Following is the rationale for an aggressive data quality assurance program:

* Without attention to data quality assurance, data will have inaccuracies.

* These inaccuracies can be costly.

* To improve data accuracy requires an active data quality assurance program.

* Maintaining accuracy requires an active data quality assurance program.

A business case is the difference between what is gained and what it costs. Business cases are adopted only if this is believed to be positive. Business cases are also adopted only if they survive a test of comparison to other business cases competing for resources. Corporations cannot pursue all positive business cases. The trade-offs involve whether a proposal is required (such as implementing HIPPA), the degree to which the proposals are aligned to corporate visions, core competencies, and the risks involved in achieving the objectives laid out in the proposals. In other words, you are always competing for resources and attention.

It would be nice if the world were as objective as this. In reality, projects often get funded for more political and emotional reasons as well. The popularity of a certain topic within the outer community often influences decisions. This explains why data warehousing projects got so much attention in the early 1990s and CRM projects are getting so much attention now. Most of these very expensive projects had very thin business cases built for them before they were approved. The primary approval driver was intuition.

To determine value, you look for costs to the corporation that result from inaccurate data. The assumption is that if you eliminate the inaccuracies, the costs go away and that becomes the value to the corporation. It is generally not a one-time value because most of the observed problems repeat on a regular basis.

Typical Costs: Typical costs developed for the business case are the costs of rework, lost customers, late reporting, and wrong decisions. All of these costs were incurred in the past and cannot be recouped. The business case assumes that these costs will be repeated in the future, and therefore you can estimate the future savings by just using the past costs.

It is difficult to identify all of the costs of inaccurate data up front. Those listed previously are visible and obvious. But even these have to be examined to ensure they are the result of inaccurate data and not other causes.

Other costs probably exist but have not yet surfaced. Many times, bad data is driving a process that would not be necessary with good data, but no one associates it with bad data. Other times, wrong decisions are made because of bad data, but the bad data is never discovered or, if it is, the cause is not traced back to bad data. Another problem with most data quality business cases is that they do not identify the really big costs to the corporation.

Wasted Project Costs: One of the big costs is the enormously wasteful practice of implementing large projects that depend on existing databases without first understanding the extent to which the data and the metadata in them is bad. Every corporation has stories of cancelled projects and significantly delayed projects due to excessive thrashing over data issues. These issues arise during the moving of data due to inaccurate data or metadata. For some reason, we never want to count these costs as part of the business case for data quality assurance activities. They are huge and visible.

It is not unusual to see wasted costs in the tens of millions of dollars for projects that seem never to get done or are cancelled before they get to their objective. The contribution of inaccurate data and inaccurate metadata to these failures is very high, often being the factors that torpedo these projects.

Costs of Slow Response to New Needs: The really big cost is the penalty the corporation suffers because these projects take too long to complete or are completed with unsatisfactory outcomes. For example, the corporation trying to reach a new business model through the project is forced to continue on the less-desirable business model because of their inability to move forward. This may be a temporary penalty, such as waiting for the project to complete, or a permanent penalty, such as the project getting cancelled.

For example, getting the information systems of two businesses merged after an acquisition done in one year versus three years is a huge advantage. Getting a CRM system completed and serving the sales and marketing organizations in one year is much better than working on it for three years and then cancelling it.

There would be an enormous savings if a corporation were able to respond to the need for new business models faster than any competitor, at lower cost than any competitor, and more accurately than any competitor. Clearly, corporations that have perfectly accurate data and perfectly accurate metadata can implement a new business idea faster and cheaper than a company that does not.

A corporation with significantly inaccurate data and metadata can either plow through the project, risking failure and long delays, or perform the data quality improvement work at the front end. In either case, the corporation with accuracy up front has gained a significant advantage on those that do not have it.

As you can see, all value lies in reduced costs. It is also clear that many of these costs are not known, are hidden, and are subjective. Some are being incurred on a daily basis, some are lurking in the background to potentially hit you in the future, and some are obstacles to your moving your business models forward in the future. The biggest component of costs are those that have not yet been incurred.

This characterization makes management inclined not to believe the numbers brought to them. They know there are some costs. However, the business cases presented are either factual, meaning that they grossly understate the value, or are speculative, meaning that they cannot be certain they are that large or that they will ever happen. This leads to skepticism about any numbers in a proposal.


Costs associated with achieving accurate data

The costs of pursuing a data quality program start with the costs of creating and maintaining a central organization for data quality assurance. Developing temporary teams of people to attack specific problems and then letting them drift off to other assignments is not a useful approach to data quality. A central team is needed. There is plenty for them to do.

This team needs the usual resources to accomplish tasks. They need PCs, software, availability of a database or two to store project information, training, and administration support. These costs can be considered general costs of performing data management, or they can be allocated to specific data quality projects.

The quality team will carve out activities such as a data quality assessment for a specific application. In pursuing this, additional costs are incurred by people outside the data quality team. They will be asked to spend time with the team, to provide access to data, and to participate in issues discussions and reviews. These costs are generally not counted toward data quality because these people are already being charged to their normal functions. If a department assigns someone full-time to be a liaison to the quality activities, it would probably be counted.

The real costs come when the quality team has conducted studies and has a list of issues with recommended remedies. The cost of implementing remedies can be very high. It can involve application renovation projects and the modification of existing applications to add checkers and monitors. Often they can be combined with other objectives to create a project that does more than just improve quality, thus mitigating some of the costs.

Another cost to consider is the cost of disruption of operational systems. Fixing data quality problems often requires changing application programs or business processes. The testing and deployment of these changes can be very disruptive to critical transaction systems. Downtime of even one hour to an important transaction system can cost in the tens of thousands of dollars.

For more information about this book, please go to the Morgan Kaufmann Publishers Web site at www.mkp.com