Neo4j Releases Graph Database for Data Science
- By John K. Waters
Neo4j just released a new product the graph database provider is billing as the first data science environment "built to harness the predictive power of relationships for enterprise deployments."
Neo4j for Graph Data Science was designed to allow data scientists to leverage highly predictive relationships and network structures that have been largely underutilized to address "unwieldy problems" -- things like user disambiguation across multiple platforms and contact points, identifying early interventions for complicated patient journeys, and predicting fraud through sequences of seemingly innocuous behavior.
The new framework combines a native graph analytics workspace and graph database with scalable graph algorithms and graph visualization to form a framework data scientists can use to operationalize analytics and machine learning models that infer behavior based on connected data and network structures, the company says.
"A common misconception in data science is that more data increases accuracy and reduces false positives," explained Alicia Frame, lead product manager and data scientist at Neo4j. "In reality, many data science models overlook the most predictive elements within data: the connections and structures that lie within. Neo4j for Graph Data Science was conceived for this purpose, to improve the predictive accuracy of machine learning, or answer previously unanswerable analytics questions, using the relationships inherent within existing data."
Graph algorithms are a subset of data science tools that capitalize on network structure to infer meaning and make predictions, such as: cluster and neighbor identification through community detection and similarity algorithms; influencer identification through centrality algorithms; and topological pattern matching through pathfinding and link prediction algorithms.
In its announcement, Neo4j cited COVIDgraph.org as an example of how these tools are being using by researchers to understand the current pandemic. The volunteer group comprises a diverse team of scientists, developers, and "data people" from academia and industry who are working together to build a knowledge graph on both COVID-19 and SARS-CoV-2 that integrates public datasets, such as relevant publications, case statistics, genes and functions, and molecular data. The graph the group is building is implemented in Neo4j.
"Nothing is more pressing today than understanding COVID-19," said Alexander Jarasch, head of the Data and Knowledge Management group at the German Center for Diabetes Research, and collaborator on COVIDgraph.org. "Graphs give us the ability to bring together the salient information around this confounding disease and provide a synthesized view across heterogeneous data. Today's understanding of this coronavirus is severely hampered by minimal peer-reviewed research and the absence of long-term clinical trials. Neo4j for Graph Data Science will help us to identify where we need to direct biomedical research, resources, and efforts."
Neo4j continues to be something of a harbinger of the growing need for advanced graph database technology. In a report published last year by Verified Market Research ("Global Graph Database Market Size And Forecast"), the worldwide graph DB market, valued at $780.71 million in 2018, is likely to reach $4.13 billion by 2026, growing at a CAGR of 23.04% from 2019 to 2026.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].