Open Source Neo4j Adds Built-In ETL to Java-Based Graph Database

Neo Technology, the commercial sponsor of the Neo4j open-source NoSQL graph database implemented in Java, this week enhanced its second major update, released earlier this year, with a point release that adds built-in ETL (extraction, transformation and load), new functionality for easily mapping tabular data into Neo4j from CSV files, and a faster data loader.

The overwhelming majority of enterprise data is stuck in relational databases, says Redmonk analyst Donnie Berkholz, primarily because of what he calls "historical and technical inertia." In other words, it's still there because that's where it started and moving it is not an enterprise priority -- but it probably should be, given the value of graph databases to present unwieldy chunks of data in more intuitively understood forms. Relational tables are not suited to the ideal structure of the data, Berkholz says, which results in poor performance and "unintuitive access."

"By lowering the migration barrier from relational to graph stores, the latest Neo4j release targets this mismatch, enabling the market reality to grow closer to what's best-suited for the data," Berkholz said in a statement.

"The relational database is simply where data tends to get trapped these days," said Philip Rathle, vice president of products at Neo, told ADTmag. "With Neo4j 2.1, we've built a way to get it out."

This is a kind of next logical step in the evolution of the graph database, which was first released in 2010, and part of a larger company plan to "bring graph databases to the masses," Neo CEO Emil Eifrem told ADTmag in an earlier interview. Eifrem has been working on Neo4j since the open-source project launched in 2000.

Version 2.0 of Neo4j, unveiled in January, included "radical improvements" to Cypher, the company's home-grown declarative query and modification language, support Java API and RESTful HTTP API access methods, as well as a long list of programming languages, from .NET and Java to Python and Scala. It also introduced a new schema construct called "labels," which makes it possible for developers essentially to tell the database more about the data.

Neo4j 2.1 "eats relational data for lunch," Rathle said, with a built-in ETL tool for importing data from relational and other databases seamless, and fast import option. With this release, the company also expands its own Cypher graph query language to include extracting and mapping data from CSV files. And a new "superfast" data loader makes it possible for developers to map and move high volumes of data into a graph quickly.

Analysts at Forrester Research estimate that more than 25 percent of enterprises will be using a graph database by 2017. That growth will span virtually all industries, the analysts predict, in areas ranging from digital content management to bioinformatics, ID management to the Internet of Things.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].