Data warehouse builders advocate for different architectures

Since the beginning of the data warehousing movement in the early 1990s, there have been two dominant approaches to architecting data warehouses—the Inmon and Kimball models. Recently, two other models have gained in popularity—central data warehouses and federated data warehouses. Most organizations use a combination of models to suit the unique needs of their users.

The Inmon model, named after Bill Inmon, a prolific author and respected figure in data warehousing, advocates a multi-tiered architecture in which data warehouses collect data from multiple sources, integrate the data, and then distribute subsets to downstream data marts. In this hub-and-spoke approach, users query the data marts instead of the data warehouse, which functions more as a staging area and distribution center.

This approach makes it easy for administrators to repurpose staged data to support new apps without having to start from scratch. It also allows administrators to optimize different servers for different purposes, improving process efficiency. On the downside, it replicates large volumes of data across multiple servers, which costs more money. It also takes longer to build the architecture and reap the economies of scale this approach delivers.

Another camp follows the advice of Ralph Kimball, another prolific author and respected industry figure. The Kimball model dismisses the need for a data warehouse. Because most users want detailed data, Kimball argues it’s better to store data in individual data marts and logically connect those using “conformed” dimensions. To optimize query performance and improve ease of use of the data marts, Kimball popularized a data model, known as a star schema, which is widely used today.

The Kimball approach is faster and cheaper than the Inmon because it doesn’t require companies to build staging and distribution layers before creating their first data mart. However, if organizations are not careful to conform the dimensions of data marts, the environment can quickly become disintegrated because each mart is potentially designed and populated independently.

Central vs. federated
The central data warehouses architecture, which is advocated most strongly by Teradata, prescribes using data warehouses without any data marts. This centralized approach gives users access to all data in the data warehouse instead of restricting them to data marts. It also minimizes the amount of data the technical team has to move or transform, simplifying data management and administration. However, the data volumes and numbers of users of central data warehouses often grow extremely large. To provide adequate query performance when that happens, organizations need a high-performance, parallel processing database (such as the one Teradata provides), which can be expensive to purchase and maintain.

The federated approach is supported by middleware vendors that offer distributed query and join capabilities. These largely XML-based tools provide users with a global view of distributed data sources, including data warehouses, data marts, Web sites, documents and operational systems.

When users select query objects from this view and hit the submit button, the tool automatically queries the distributed sources, joins the results, and presents them to the user. Because of performance and data quality issues, most experts agree that federated approaches work well to supplement data warehouses, not replace them.

TDWI research shows a majority of organizations prefer Inmon’s multi-layered approach. In one of our surveys, when asked to describe the architecture of their data warehouse, 56 percent of respondents selected “hub-and-spoke” data warehouse and 22 percent selected “central data warehouse only” and 12 percent selected “conformed data marts” or the Kimball approach. Only a fraction of respondents chose federated or other approaches.

Wayne Eckerson is director of research at TDWI and is working on a book about performance dashboards for John Wiley & Sons. He can be reached at [email protected].