Have Scripting Language, Will ETL

If you didn't know it already, IBM Corp.'s billion dollar-plus acquisition of ETL specialist Ascential Software Corp. confirmed it: data integration is a big business. Consultancy META Group, for example, says that ETL revenues will grow at a 10 to 20 percent clip over the next three to five years, and market watcher Forrester Research puts 2004 ETL revenues at close to $1 billion.

That's not chump change. But the truth is that many shops are still using scripting, programmatic SQL or other, typically low-tech approaches, to perform their nuts-and-bolts ETL processing. And, in cases where companies are using bona-fide ETL tools, they're typically tapping the native facilities that come bundled with their RDBMSes, such as Microsoft's Data Transformation Server (DTS) or Oracle's Warehouse Builder.

Although there's some reason to expect this will change over the next few years, analysts say, in many cases, low-tech ETL should still suffice.

For the uninformed, ETL describes a process in which data is extracted from one data source, transformed to a different format, and loaded into another, perhaps architecturally different, data source. Enterprise ETL tools such as Ascential's DataStage and Informatica Corp.'s PowerCenter boast native integration with every major relational database on the market and ship with adapters for many other data sources, including SAP and Siebel.

For many integration scenarios, however, ETL simply involves copying data from one standards-compliant data source to another. If you've a customer billing system that supports ODBC, for example, and if you want to load data from it into SQL Server or DB2, it's a simple matter of reading data from the source system and loading it into the relational database.

Even in cases in which source data must be reformatted, most such transformations are relatively simple, e.g., converting text to numeric, text to date, or merging two or more fields and concatenating name, and can be accomplished by means of scripting. For more complicated transformations (e.g., looking up data in another table and replacing it, splitting one field into many), many DBAs also opt to use scripting or programmatic SQL.

This is a point even ETL vendors concede. "People want to take advantage of what they've got, and most of them have some [form of ETL] already. It's more often not an ETL tool, it's hand coded," says Darren Cunningham, product marketing manager for Business Objects' Data Integrator ETL tool. "We hear from customers all the time who don't want to throw away those scripts they've already got. They're asking us to [support] these with our tool. They want to go along with what we're doing, but they don't want to rip out what they're doing."

This may change, particularly as organizations revamp their operational systems to enable real-time data feeds and pursue other initiatives that emphasize unprecedented integration and coordination among heterogeneous systems. "With data volumes burgeoning, the list of sources and targets lengthening, and the number of projects and developers increasing, user organizations will need to select tools and design implementations that will keep scaling over time," agrees Phil Russom, an analyst with Forrester.

About the Author

Stephen Swoyer is a contributing editor. He can be reached at [email protected].