In-Depth

Special Report: Right-time enterprise on the rise

The term batch processing conjures pictures from “That ’70s Show” -- pocket protectors, handheld calculators, leisure suits, lava lamps and maybe Karen Carpenter singing in the background.

Now it is time to knock this writer of his high horse and say: “Hey, all that stuff isn’t bad.” And that is right! Batch processing, particularly, served a purpose, worked, and still meets the needs of a lot of people whose job it is to analyze corporate data and plan business tactics and strategies.

In the realm of data warehousing, batch loads have long been the most reasonable way to run an ordered shop. But even in such shops, the reign of batch may be laboring to a halt. Still, we should note that -- hyperbole surrounding the real-time enterprise and real-time analytics to the contrary -- this won’t happen overnight.

Viewers suggest a move is underway to take the “warehousing” out of data warehousing. Operations and data analytics should be more closely aligned, they note. And data analysis should trigger events based on changeable business rules ready to run day and night atop the latest application servers. A more likely scenario going forward will feature a mix of batch, trickle-feeds or streams, and true operational stores working in a kind of concert.

As expert Tony Baer suggested in the October issue of ADT [“BI: Real time or right time?,”}, it is likely IT managers will look to hold a middle-ground position, augmenting carefully scheduled BI batches with event-driven capabilities. “These applications could be called ‘right-time’ systems,” Baer wrote, “because different portions of the analyses are performed when practical and as needed.”

All of which means that more application development managers will be looking across more conference room tables at more data warehouse managers. Each will learn a bit more about the other’s domain.

“The data warehouse people are in the midst of a big paradigm shift from files to streams,” said Stephen Brobst, managing partner at Strategic Technologies & Systems. “Some people make the shift better than others. If you’re used to a batch-type world, it can turn your head upside down.

“On the application side for operations source systems it’s often painful as well, as they often have systems that are batch-oriented, too,” he added. Brobst said companies today are making the shift, noting Delta Airlines as one that has done it successfully. He added, however, that “it is not without pain” as teams come up the learning curve “from batch to stream.”


Firms get in queue

So how real is the right-time enterprise? Vendor moves give us some clues. Data warehousing and business intelligence (BI) software vendors that built their businesses on extract, transform and load (ETL) offerings are expanding their support for varied middleware offerings. Increasingly, messaging middleware is central; other messaging means are sometimes involved.

This year, Redwood City, Calif.-based Informatica Corp. released its PowerAnalyzer 4 product with what it describes as intelligent caching and integrated performance management metrics, Java Messaging Services (JMS) support and federated-querying capabilities. It is not alone in this. The company has also continued to boost its support for IBM MQ tools, an ongoing trend in data integration.

Looking to span the chasm between XML-oriented middleware and business analytics, Informatica partnered with webMethods on the Business Activity Platform for enterprise integration and real-time operations analysis. The firm also purchased mainframe middleware expert Striva for pushing and pulling data to and from mainframes without recourse to bulk file moves.

For its part, Ascential Software has increased its support of IBM WebSphere (MQ-style, MQ and other) services and connections to JMS. The Westborough, Mass., company has touted the idea of real-time integration, coupled with data quality, and has been one of several companies to promote XML-based Web services as a valued integration model for data warehouse integration.

In September, Ascential completed its acquisition of Mercator Software, which was once one of the more competitive players in the EAI middleware segment. This move is an important endorsement of the idea of middleware merging with ETL capabilities. In keeping with the quality theme, Ascential’s Bob Zurek, vice president of advanced technologies and product management, noted that Mercator was “messaging-agnostic” and that “their expertise was in transformation executed in real-time.” (These capabilities derive in no small part from Mercator’s early days as an EDI force.)

And with its August release of DT/Studio 2.0 ETL tools, San Francisco-based Embarcadero Technologies added message queue support as a standard feature to allow users to integrate real-time data sources with more traditional static information.

Similar developments led no less an industry thinker than Ken Gardner -- formerly of Sagent and ReportSmith, and now leading Iteration -- to suggest that message-based architectures are how things will be done in data reporting. The spread of message-based environments is the key to truly up-to-date reports on business operations, he said. [See “In-memory scheme, messaging mark Iteration’s reporter rollout,” ADT, May 2003, p. 11.]


Brobst views field

How did we get here and where are we going? Expert Brobst offers some perspective. Traditional data warehouse developers and ETL specialists have been using ETL tools like those from Informatica and Ascential, he notes. “Those kinds of tools have largely been batch-oriented tools. They go out in the middle of the night and extract data from bookkeeping systems, for example. Then they transform it,” he said.

In a traditional data warehousing system, data acquisition has been done using just such a file-based paradigm, Brobst indicated. “That’s been a fairly non-intrusive thing.”

But, he asserts, “with real-time data warehousing you can’t be doing [file-oriented work] that happens once per night. You have to move toward a message-based paradigm or a stream-based data acquisition paradigm. You now need the application to write a message in the most primitive form to a messaging system, and this requires work from the application.” (And, one might add, the application developer.)

“Now, when an event occurs, you have to write it to a reliable queuing system,” said Brobst. “People have to write events to queues, which means the application changes.”

At the same time, as shown by the aforementioned series of vendor moves, there is a convergence between ETL tools (which Brobst chooses to call data-level EAI tools) and traditional or process-level EAI tools.

The EAI tools tend to lack the sophistication of data transformation found with ETL tools. But EAI tools are more aligned with business processes. “What you are seeing is that the ETL tools are building adaptors so they can take the EAI data sources. What I believe will happen in the next two years is that data-level EAI [ETL] and process-level EAI will consolidate,” said Brobst.

Here is Brobst’s take on the evolution of data warehousing environments:

* At Stage 1, reporting is key. Here one finds the Key Performance Indicator (KPI) and Balanced Scorecard systems. Stage 1 is primarily batch.

* At Stage 2, analysis is key. Here you have interactive access to data and you have business people, not just application developers, that can “program” the system to get specific analytical results.

* Stage 3 represents the advent of prediction. Use of analytical modeling and data mining grows. In Stage 3 you look at historical data to essentially predict the future. This requires very advanced algorithms for doing such predictions, noted Brobst. That data does not have to be delivered in “real time.”

* At Stage 4 you “operationalize” your approach to data warehousing. Continuous updates become important.

* And at Stage 5, the activate stage, what Brobst describes as the “Active Data Warehouse” is achieved. Event-based triggering takes hold.

“In the activate stage, we move to less human involvement,” noted Brobst. “But you need checks and balances. Systems can spin out of control without governors of their behavior.”

Some might even position emerging Business Activity Monitoring (BAM) techniques as an application-specific subset of the “operationalized” or activated data warehouse. For more on this topic, see “BAM” by Johanna Ambrosio .

However, any evolutionary technology continuum can be a quagmire for the ill-prepared. We asked Brobst what mistakes should be avoided as one traverses this landscape.

“A common mistake people make is that they don’t align the service levels for data acquisition and delivery with business processes. As well, people sometimes have a tendency to over-engineer. You don’t need to deliver data in one second when the application doesn’t care if it has to wait for five minutes,” replied Brobst. “Another mistake is underestimating the amount of work needed to change a legacy store to feed the application in near-real time or on a continuous basis. If source systems can’t deliver, it is all for naught.”


I want my data

People want to keep their well-defined data stores in place, but have the ability to access some real-time information, said Ashish Mohindroo, principle product marketing manager for Oracle application servers. He describes a paradigm much like Brobst’s Active Data Warehouse. Mohindroo and Oracle call this an “event-driven architecture.”

“If it’s heavy duty, people would rather do it in batch,” said Mohindroo. The amount of data involved becomes a natural demarcation point. “If you’re talking about gigabytes and terabytes of data across the enterprise, it’s more process- and network-intensive,” he noted. “Then, updates are best done nightly, weekly, whatever.”

The alternative happens when there is data that “needs to be accessed in real time; for example, a customer order that cannot wait until overnight,” said Mohindroo. That would need to be propagated the moment an order is placed.

The push to mix batch- and stream-integration methods is not limited to business intelligence and data warehousing. Data integration generally is taking on more of the flavor of transaction-, messaging- and publish-and-subscribe-oriented middleware. Sybase, Oracle, InterSystems and Pervasive are among the vendors that have improved their messaging and XML prowess through deals or in-house efforts.

Once enough messaging moves are made, the industry can anticipate vendors to also embrace somewhat more complex publish-and-subscribe middleware alternatives.

Whether it is called the real-time enterprise, the right-time enterprise, the zero-latency enterprise or what have you, vendors will continue to try to position their offerings to play there.

Meanwhile, it is worth noting that a promising dark horse candidate is represented by integration efforts centered on the nascent XQuery. Using XML-based XQuery, users and vendors hope to quickly access stores of highly diverse data sources. Actuate Corp., South San Francisco, saw enough value in this camp to purchase XML enterprise information integrator Nimble Technology last summer. And application server leader BEA Systems also paid to better play in this arena.

The view of real-time and right-time can be quite subjective, as was shown in the sidebar “What’s real at Fleet?” in last month’s issue. Too, the continuing role of automating manual processes, whether old batch loads or new load scripts, should not be overlooked.


Infinity: Real-time in context

As in other areas, real-time results are really what users seek.

Ask Andy Palmer (left), CIO at Infinity Pharmaceuticals in Cambridge, Mass. Infinity is a start-up that creates data libraries for early-stage drug discovery. Synthetic chemistry, chemical genetics, informatics and biological screening all come into play. Palmer’s company must make the correct decisions about which chemical libraries to build. It costs too much to go down many blind alleys.

Sometimes called BioIT, the field Infinity plows has been a classic technical market, where power users have had to build their own tools. As data pools for biology proliferated, these power users often became power scripters, writing lots of Perl and the like to glue together files and databases for analysis. Part of Palmer’s task is to find where automation can be applied.

“We had a unique situation. We were a start-up, and we were basically able to build-out our infrastructure from scratch,” he said. The company designed Spotfire into its “clean-slate architecture.” Spotfire makes the Decision Site analytic app package. After two-and-a-half years, the Infinity store for analysis is up to 3 Terabytes.

What does the so called real-time enterprise mean to Palmer? “Our user community comprises scientists. These folks spend all their time running experiments, and our primary role is satisfying them. That means taking the information that comes out of their experiments, and turning it around to them in a form they can analyze in as short a time as possible,” he said.

“Latency here is measured as time from finished experiment to getting data back to the experts. It means reducing the mean time between experiments,” he explained.

To optimize performance, Palmer and crew automate the capture of data from experiments, process as much of it as quickly as they can while ensuring data quality, and then automate the delivery back to the scientists.

Please see the following related stories: “Scouts Canada blazes trail in membership data access” by Rich Seeley

May the portal be with you” by Jack Vaughan

A sampling of recent BI products” by Lana Gates