What's in Store for Big Data in 2016?
In a time-honored New Year's tradition, industry players weigh in with their predictions for the Big Data landscape in 2016.
Teradata Corp. (Dan Graham, general manager of Enterprise Systems)
MapR Technologies Inc. (John Schroeder, CEO and co-founder)
- Organizations hit reset on Hadoop. As Hadoop and related open source technologies move beyond knowledge gathering and the hype abates, enterprises will hit the reset button on (not abandon) their Hadoop deployments to address lessons learned -- particularly around governance, data integration, security and reliability.
- Algorithms enter the boardroom. Algorithms heat up in the data ingest and preparation processes for house holding and profiling. As a result, CEOs and investors will start talking deep analytics as core business goals.
- Data lakes will finally discover a few killer Apps. Data lakes will be the most common repository for staging raw Internet of Things (IoT) data, driven by volume and costs. The size of IoT machine-to-machine (M2M) data will overrun in-memory capacity by orders of magnitude, driving implementers to data lake technologies for low-cost storage.
451 Research (Nick Patience, vice president, Software)
- Converged approaches become mainstream. For the last few decades, the accepted best practice has been to keep operational and analytic systems separate, in order to prevent analytic workloads from disrupting operational processing. Hybrid Transaction/Analytical Processing (HTAP) was coined in early 2014 by Gartner to describe a new generation of data platforms that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) without requiring data duplication. In 2016, we will see converged approaches become mainstream as leading companies reap the benefits of combining production workloads with analytics to adjust quickly to changing customer preferences, competitive pressures, and business conditions. This convergence speeds the "data to action" cycle for organizations and removes the time lag between analytics and business impact.
- The pendulum swings from centralized to distributed. Tech cycles have swung back and forth from centralized to distributed workloads. Big Data solutions initially focused on centralized data lakes that reduced data duplication, simplified management and supported a variety of applications including customer 360 analysis. However, in 2016, large organizations will increasingly move to distributed processing for Big Data to address the challenges of managing multiple devices, multiple datacenters, multiple global use cases and changing overseas data security rules (safe harbor). The continued growth of IoT, cheap IoT sensors, fast networks, and edge processing will further dictate the deployment of distributed processing frameworks.
- Analytics will become more prevalent throughout the layers of technology businesses use, from development, IT management and databases to customer experience management -- everywhere. In particular, we expect to see a surge of interest in what we call contextual analytics -- the combination of text and advanced analytics with machine learning to uncover insight from a combination of structured and unstructured data.
- Multi-type databases will emerge -- companies will want to address multiple database types in one system versus having to juggle several different ones. We'll definitely see a rise tools such as Cassandra for combining JSON, SQL, NoSQL and so on.
- Encryption at multiple levels -- with the constant threat of breaches, organizations will embrace multiple layers of security. In addition to an increased focus on access and authorization, companies will start to implement native encryption to protect data as it resides in the database and SSL encryption to protect data as it moves between applications.
Reltio (Manish Sood, CEO and founder)
- Big companies will begin to see the democratization of data preparation as a natural consequence of the democratization of analytics that has been driven by new products such as Tableau. (Andy Palmer, co-founder and CEO)
- We will see the emergence of DataOps as a way for enterprises to manage and embrace the full volume and variety of their data -- helping them rapidly deliver data that enables and accelerates analytics. (Andy Palmer, co-founder and CEO)
- Data science (and its technology complex analytics) will break out in 2016. How to integrate this technology into DBMSs will emerge as a major issue in this space. (Mike Stonebraker, co-founder and CTO)
- The net effect of "one size does not fit all" is that most applications will use multiple DBMSs, each optimized for a portion of their requirements. Work on multi-DBMS "wrappers" (so-called polystores) will intensify. (Mike Stonebraker, co-founder and CTO)
MongoDB Inc. (Kelly Stirman, vice president of Strategy)
- Big Data and IoT will continue to be too big to ignore. While the term Big Data has been overused, the reality is that not many enterprises in B2B have taken the plunge. Talks at Big Data conferences still discuss fundamental concepts, and industries such as pharma, who previously never really considered their data "big" are beginning to realize that they need to plan for the future. And while size is not what matters, an increase in variety and sources of data will provide more relevant insights and better outcomes.
- Hadoop will get thrown for a loop. It's hard to believe that Hadoop is over 10 years old. While interest remains strong and usage is maturing, there continues to be new options that either complement or provide an alternative to Hadoop to handle Big Data. The rapid ascension of Apache Spark and Apache Drill are examples. We'll continue to see more options in the New Year.
Altiscale (Mike Maciag, COO)
- Data gets its seat on the board: CDO becomes a must-have title for Fortune 500.
CIOs are focused on the infrastructure to process data. CDOs are tasked with making the organization see data as an asset: making that data accessible, managed and governed, and finding the balance between extracting value from data and mitigating the risk of breaches.
- The ever-growing rise of Kafka: Data streams join databases to power modern business apps.
Kafka will become an essential integration point in enterprise data infrastructure, facilitating the creation of intelligent, distributed systems. With the growth of IoT, global deployments and microservices, the need to capture and control in-flight data before it's stored in a database is becoming more important. Kafka and other streaming systems like Spark and Storm will complement databases as critical pieces of the enterprise stack for managing data across applications and data centers.
- Salaries for both data scientists and Hadoop admins will skyrocket in 2016 as growth in Hadoop demand exceeds the growth of the talent pool. In order to bypass the need to hire more data scientists and Hadoop admins from a highly competitive field, organizations will choose fully managed cloud services with built-in operational support. This frees up existing data science teams to focus their talents on analysis instead of spending valuable time wrangling complex Hadoop clusters.
Datameer Inc. (Stefan Groschupf, CEO)
- Data integration gets exciting. These days many companies want agile analytics. They want to get the right data to the right people, and quickly. This is no small challenge, because that data lives in many different places. Working across data sources can be tedious, impossible, or both. In 2016, we'll see a lot of new players in the data integration space. With the rise of sophisticated tools and the addition of new data sources, companies will stop trying to gather every byte of data in the same place. Data explorers will connect to each data set where it lives and combine, blend, or join with more agile tools and methods.
- Advanced analytics is no longer just for analysts. Non-analysts across the organization are becoming more sophisticated. They've come to expect more than a chart on top of their data. They want a deeper, more meaningful analytics experience. Organizations will adopt platforms that let users apply statistics, ask a series of questions, and stay in the flow of their analysis.
Splice Machine Inc. (Monte Zweben, CEO and co-founder)
- IoT moves from hype to substance. The IoT has already emerged as the next mega-trend, but in 2016 it will excel beyond just hype. We will see companies actively change their strategy and infrastructure to harness the power and insight of IoT technologies and data.
- Cloud/on-prem analytics distinction becomes a game changer. Right now, a few cloud-only Hadoop players exist, and other vendors offer rather distinct on-prem and cloud editions of their products. In 2016, as companies recognize the advantage of side-stepping Hadoop hardware requirements, which becomes outdated every 18 months, cloud adoption will surge. Vendors, particularly distributors, will pivot their offerings in order to keep up with demand.
Qubole (Ashish Thusoo, CEO and co-founder)
- Businesses will start making decisions in the moment. Companies want to personalize cross-channel experiences based on real-time information (that is, last five clicks of a mouse), not on day-old data from their Extract, Transform, Load (ETL) process. Beyond ecommerce, this means reducing the time it takes to complete the "data to insight to action" cycle. Using real-time or near real-time data, which is only a few seconds behind, is one of the biggest ways to speed that cycle up. This will dramatically improve personalization for customers, leading to increased conversion rates and ultimately revenue. Businesses will spot and react to trends faster, iterate and quickly outpace the competition and subsequently "leave them in the dust" when responding to data in the moment.
- Spark will kill Map Reduce, but save Hadoop. Map Reduce is quite esoteric. Its slow, batch nature and high level of complexity can make it unattractive for many enterprises. Spark, because of its speed, is much more natural, mathematical, and convenient for programmers. Spark will reinvigorate Hadoop, and in 2016, nine out of every 10 projects on Hadoop will be Spark-related projects.
Redis Labs Inc. (Leena Josh, vice president, Product Marketing)
- IoT and the cloud -- with the ever increasing amount of data produced through connected environments and apps, the need for Big Data and analytics is just getting more and more pronounced. As this market transitions to the mainstream, there is a need to simplify the process of unlocking Big Data insights. Cloud-based Big Data services are driving a lot of this, and increasingly making Big Data accessible in a simplified manner to the mainstream market. This trend will keep accelerating. We will also see an increased focus on vertical analytical applications in industries, such as the healthcare sector, that will further simplify the usage of Big Data and its overall adoption.
- In 2016, more businesses will see that customer success is a data job. Companies that are not capitalizing on data analytics will start to go out of business, and the enterprise will realize that the key to staying in business and growing the business is data refinement and predictive analytics. The combination of social data, mobile apps, CRM records and purchase histories via advanced analytics platforms allow marketers a glimpse into the future by bringing hidden patterns and valuable insights on current and future buying behaviors into light. In 2016, businesses will increasingly leverage these insights to make sure that their current and future products and services meet customer needs and expectations.
- Organizations will opt for database technologies that can provide analytics at the same speed as their main business -- that can not only process large volumes of transactions at extremely low latencies, but also allow for in-memory analysis and instantaneous decision-making.
- Analysis of extremely large datasets will now move to memory, enabled by cost efficient persistent memory technologies such as Non Volatile Memory Express (NVMe), Storage Class Memory (SCM) and 3D Cross Point (3DXPoint).
- IoT data sources will need databases that can handle and persist millions of inserts per second, that process time series data with very low latencies and that can support hybrid on-premises to cloud processing seamlessly.