Self-Service Data Warehouses
- By Alan Radding
DATA WAREHOUSES WITH A DIFFERENCE
- Data warehouses today go beyond forecasting and decision support with their high volumes, large numbers of users and complex workloads. This is where the SOA can make a difference.
- Not only does the SOA decouple the request for data from specific data-bases, applications and servers, but it gets you out of the business of writing each report from scratch.
- Through a service interface, organizations can do more with the query results. For instance, they can combine the results with other services to create a composite application.
The action in Web services and service-oriented architectures has focused so far on transaction systems and production applications, in effect using Web services to call or invoke all or parts of these systems. A few organizations, however, are exploring how to apply the concept of services and service-oriented architectures to the data warehouse and activities such as business intelligence and business analytics.
It is important that these efforts pan out, says Lou Agosta, an analyst at Forrester Research: “Incorporating data warehousing into your overall service-oriented architecture is critical for the success of business process management and for direct transactional access to business intelligence,” he asserts in his report, “Service Oriented Architecture Assimilates Data Warehousing.”
Some technology users agree, at least in theory: “Using a SOA with a data warehouse and business intelligence looks appealing, but we are just in the learning stage with this,” says Joyce Lush, senior application technology manager at Yale University. “We’re not even experimenting yet.” A large part of the appeal is the standard interface, which would allow the school to change parts of its infrastructure behind the interface without affecting everything else. It also promises to give Yale’s power users a way to build applications more aligned with the business and to more easily share what they build.
Transparent inner workings
Despite all the hype and vendor spin, SOA is not a radically new concept. Web services and the SOA are just the latest developments in the evolution of distributed computing, following in the path of remote method invocation (RMI), common object request broker architecture (CORBA), and component-based development. “Basically, it is just a layer between the data warehouse and the query,” says Claudia Imhoff, president, Intelligent Solutions, a consulting company.
The SOA defines a standard interface and standard protocols (HTTP, XML, SOAP, WSDL, UDDI) for accessing and invoking objects across the network. The objects happen to be services that provide a well-defined capability. The services are registered and managed in a directory. Like objects, the inner workings of the service are transparent and unimportant to the requestor of the service. Similarly, services are reusable when correctly designed.
In some ways, the data warehouse is inherently service oriented. It stores and manages data and provides access through a standard interface, SQL. Over the years, organizations have adopted a variety of reporting, ad hoc querying, business intelligence and business analytics tools that allow for easier access to the data in the data warehouse and effectively mask the need for the requestor to know SQL. “If you are talking about a SQL query, then yes, the data warehouse is a service architecture and an SOA makes no sense,” says Robin Bloor, partner, Hurwitz and Associates.
However, the data warehouse has evolved and is no longer a single data store for relational transaction data. “People are doing a lot more things with data warehouses,” Bloor continues. “They cut them up into data marts to serve specific interests or they combine structured and unstructured data in different ways. Companies do this because they want more and different capabilities out of the data warehouse than can be achieved simply using of SQL.”
Today’s data warehouses, according to Agosta, “go beyond forecasting and decision support with their high volumes, large numbers of users and complex workloads.” This is where the SOA can make a difference. However, data warehouses have been lagging adopters of SOA, he points out.
Data disconnected from the warehouse
Conceptually, an SOA can improve a data warehouse in several ways. “I ought to be able to define a set of services and have it work against the data warehouse,” Agosta says. “If you relied just on the SQL interface, this would be difficult to do given the volume and diversity of data involved. To begin, you need to know where and what the data is before you can even start to look for the answer you want.”
Previously, organizations coded their applications and reports to connect with specific data for specific purposes, effectively hardwiring their data infrastructure. The SOA “eliminates hardwiring applications to the data,” explains Wayne Eckerson, director of research at The Data Warehouse Institute.
Not only does the SOA decouple the request for data from specific databases, applications and servers but it “gets you out of the business of writing each new report from scratch,” Eckerson says. Instead, the organization creates a reporting service that will pull the requested information from multiple data sources if necessary and massage the various pieces of data as needed to return the desired result. Using a services approach, the organization can generate what amounts to custom reports in days, if not hours, rather than the weeks, or even months, it usually takes to produce custom reports, according to Eckerson.
Through a service interface, organizations can do more with the query results. For instance, they can combine the results with other services to create a composite application.
Shiny new tools to work with
Slowing the adoption of this SOA for the data warehouse is the lack of tools designed for the job. “J2EE and .NET don’t have the classes to mediate between SQL and services,” Agosta says.
In truth, you don’t really need much in the way of tools, Eckerson suggests. “Any front end built in J2EE can call a Web service,” he says. Any tool that will spit out or take in XML will do the job.
The business intelligence product vendors have jumped in with tools. “We’re adding a level of abstraction so you don’t need to know about the physical table or the schema,” says John Kopcke, CTO, Hyperion Solutions. The result is a loose coupling between request and the data, he says.
Beyond that, organizations need the full set of data warehouse tools—often, tools they already have. These include middleware, messaging, and ETL tools to pull data and aggregate it from a variety of sources. The business intelligence and business analytics algorithms can be built into the services themselves.
Early adopters of SOA for the data warehouse have generally focused on two areas: conventional business intelligence reporting and near real-time information.
Action instead of reaction
Lockheed Martin has been experimenting with Web services for almost two years in an effort to streamline its business intelligence initiatives. The goal is to provide business intelligence for near real-time business activity monitoring rather than traditional reactive business intelligence that tells you what went on after it happened and after you could do much about it.
Rather than start with the data warehouse and go through a full-blown enterprise schema and object mapping exercise, Lockheed started with reports. “We take a report and expose it as a Web service,” explains Sameer Sharma, principal application architect for Lockheed’s CTO. “Then we can make the report into a software component that will contain information of value to another application.”
Under the old way of doing things, behind each report was complex programming and processing. “It could take months to create one of these reports,” Sharma explains. “Underneath, programmers have written a lot of SQL.”
The new plan was to create a report Web service with the idea that a user could initiate the report service, enter the parameters, and the service would deliver the results. By turning it into a Web service, the report is created once and can be used and reused multiple times by different apps in combination with other services without additional programming.
One early approach was to take each report individually and turn it into a separate Web service. The company quickly realized that this was unwieldy and unnecessary. Instead, “we created a generic Web service to access any of the reports,” Sharma explains. The generic report Web service, he concludes, “has much more applicability.”
For instance, the reports encapsulated complex business logic and rules. By exposing them as Web services, Lockheed could not only easily leverage its legacy systems, namely the libraries of custom reports it had created, but also it could create and incorporate intelligent agents that could monitor events in real time and even dynamically influence and reroute transactions.
The only tools Sharma’s Web services team used were those provided by Business Objects, the business intelligence tools vendor. “The tools make it seamless to expose reports as Web services and then add more functionality if you want,” Sharma says. “We didn’t do anything to the back-end databases and systems. The reports themselves access the data the way they always did, using ODBC or whatever else was required.”
The report Web service does incur some performance overhead. “There is always a price to pay when you are handling XML,” Sharma says. “If someone has a lot of concurrent users and performance is a problem, they could run the report service on a server cluster and do some load balancing.”
Lockheed identifies several benefits from the report service. First, “the users can do it themselves to get results and they do not have to involve IT,” Sharma says. Second, the organization can be far more proactive. “You can use the output of the Web service in real-time situations and actually influence events,” he adds. For example, for procurement, access to the latest vendor performance report data through the report Web service could help a procurement manager to pressure a vendor for different terms.
Fresh data for stale
In the classic data warehouse model, you request data by firing off a SQL query (or triggering a SQL query through a business intelligence application) and you receive a response with the requested data. Reports traditionally follow the same model. The problem with this approach, however, is “the data is stale,” says Sunith Roy, director of frameworks and shared services at RCI, a Cendant company. As a result, too much business intelligence is outdated even before it gets to the user.
Services, however, can solve this problem. “What services bring to the data warehouse is event publishing,” Roy says. “If done right, you can use event publishing to refresh the data in the data warehouse.”
Services also can raise the level of the data by sorting, organizing, aggregating and massaging it in various ways to increase its business value, Roy says. “You can use services to populate the data warehouse with more business-level data. The data will then be better aligned with business needs.” For example, a revenue manager service would know how revenue managers work and what data they need and deliver the right data in the right form, so it is immediately useful. By comparison, today’s data warehouses are populated with low-level transaction data, which requires considerable translation and processing before it can be effectively used.
To this end, Roy envisions an SOA that incorporates a robust event-driven model that would publish notifications of meaningful events. The data warehouse would subscribe to events and be updated in near real time. “We are working on something like this right now,” Roy says. The project entails developing an event model that will monitor the activity of the company’s legacy systems for the purpose of updating the data warehouse as pertinent events occur.
RCI is relying on its existing EAI middleware, WebMethods and TIBCO, and messaging tools. Technology, however, is not the issue. “The challenges are business issues,” Roy says.
For example, RCI needs to capture data from its legacy systems, “but they cannot report in business terms,” Roy explains. Similarly, there is no way to request specific business information from its old client/server applications. Instead, the developers need to initiate a long, complex sequence of technical calls and then map the results to business issues. How difficult this is depends on how well the legacy systems have evolved along with the business model. The trick is to map the business functions to Web services and make them a part of the runtime platform.
Developers only have to go through the process once to set up business-level Web services. Over time, Roy expects to develop a library of reusable Web services that will automatically update the data warehouse as events happen. From there, business managers will be able to query the data warehouse by setting the frequency of the new events notifications.
Data mart anarchy
Services have the potential to reduce the chaos surrounding data warehouses and data marts today. “We have a client with a data warehouse and is experiencing data mart anarchy,” says Jane Griffen, principal, Deloitte. Another large company she works with uses 1,500 employees just to handle financial reporting. “Both clients are thinking that an SOA could help solve the problem by providing a level of data warehouse transparency,” she says.
On the upside, a client in the cable TV business used an SOA in conjunction with its customer service application to deliver customer profiles in real time, Griffen continues. This allowed the customer service agents to customize special offers based on the profile, which resulted in a 10 percent sales gain in one region alone.
The lesson is clear: combining an SOA with a data warehouse can reduce costs, speed access to information, and ultimately increase revenue, something that is just not likely to happen with conventional data warehouse access and SQL alone.
ILLUSTRATION BY WILLIAM RIESER
Sidebar: Looping data warehouses and transaction systems