IBM plans Eclipse-based Data Integration Toolset

There’s been a lot of talk about what IBM’s billion-dollar acquisition of data integration specialist Ascential Software Corp. might mean for customers, but little speculation about what’s in store for programmers.

There’s a good chance the combined IBM and Ascential technology stack will be a codejockey-friendly environment.

Eric Sall, program director for information integration with IBM, says Big Blue plans to deliver a converged set of development tools, based on the open-source Eclipse framework, which allow developers to work interchangeably with IBM’s Information Integrator federated data access and Ascential’s own DataStage data integration technologies.

“We have a joint group of architects and engineers planning what we can do. One idea is a converged set of tools, so we could provide a unified Information Integrator user experience, and a common application development tooling framework between [Information Integrator and DataStage],” he explains.

One upshot of this, IBM officials say, is developers will be able to choose the right technology—federated access to data sources or extraction, transformation, and loading of data from one source to another—for the job. For example, says Jeff Jones, director of strategy for IBM’s data management portfolio, a programmer might need to expose data that’s residing in a standalone repository, rather than (as is frequently the case with ETL) extracting it and loading it into a data warehouse or operational data store. “For some applications, it just makes more sense to leave the data where it’s at, to pull it in and create this consolidated view of these highly distributed, highly heterogeneous data sources.”

So what kinds of developer-friendly scenarios does IBM envision for its combined Information Integrator and Ascential stack? For starters, Sall argues, Ascential brings a lot more to the table than just ETL: it markets data cleansing, data profiling and metadata management products, too. Ideally, developers could use IBM’s converged, Eclipse-based toolset to invoke these services.

One textbook example might involve extending a data warehouse, says Sall. “Very often, the Ascential tools might be used to clean data and then load it into a data warehouse, and then you might use the federation technology to extend that to real-time, because usually data warehouses are kind of done on a nightly basis,” he explains. “Sometimes you need up-to-the-minute analysis, so the federation technology allows you to get to that data in real-time. You’re getting that high-volume data movement and the cleansing from the Ascential products and the federation for real-time access.”

Another use case, says Sall, involves data that’s accessed by a number of different applications. Physically moving this data (by means of ETL) is too onerous and performance-intensive, he suggests, so federation is the way to go. “Maybe it’s too difficult to move the information, maybe that content is being used in several different contexts so it can’t be moved—we think of that as infrastructure rationalization.”

About the Author

Stephen Swoyer is a contributing editor. He can be reached at [email protected].