IBM, Intel and Hortonworks are among a host of Big Data players making news recently with streaming analytics, high-performance computing and open source options galore.
- IBM updated its Open Platform for Apache Hadoop, a free download based on 100 percent Apache Hadoop technologies, on Intel and Power systems. Open source projects receiving updates provided by the platform, in addition to core Hadoop itself, include HBase, Hive, Oozie, Sqoop, Spark and many more. The increasingly popular Spark project is the latest darling of the Hadoop Big Data ecosystem, now available in version 1.4.1 on the IBM platform.
"Spark 1.4 is the first release to package SparkR, an R binding for Spark based on Spark's new DataFrame API," IBM said in a blog post Tuesday. "SparkR gives R users access to Spark's scale-out parallel runtime along with all of Spark's input and output formats. It also supports calling directly into Spark SQL."
The company also updated its BigInsights product, featuring "value-added capabilities" for its open source analytics offerings. With the updated version, "you will see new algorithms, including Decision Trees, Random Forests and Stepwise Compression," IBM said. "These algorithms enable R users to use existing R functions on a Hadoop cluster. Big R has expanded its library of machine algorithms to provide a richer set of classification, regression, factorization, feature extraction and survival analysis capabilities." The BigInsights 4.1 update targets Intel-based platforms.
- Speaking of Intel, the giant chipmaker on Tuesday launched Parallel Studio XE 2016, its toolkit for high-performance computing (HPC) and Big Data analytics. Intel said the studio comprises a "suite of compilers, libraries, debugging facilities and analysis tools" on Intel platforms designed to help "software developers design, build, verify and tune code in Fortran, C++, C and Java."
The new tooling includes the Data Analytics Acceleration Library (DAAL) designed to speed up Big Data processing on Hadoop, Spark, R and Matlab. It also includes a vectorization advisor, designed to help developers squeeze the best performance out of modern processors through multithreading and vectorization, the latter of which uses single instruction, multiple data (SIMD) instructions. Intel exec James Reinders detailed all the updates in Tuesday blog post.
Intel made further Big Data news this week by investing in BlueData, provider of the EPIC software platform leveraging virtualization technologies, and partnering up with the company. "As part of this new collaboration, our product team at BlueData will be working together closely with Intel in areas including Hadoop and Spark, virtualization and container technology, as well as caching and security/encryption," BlueData announced Tuesday. "We'll be optimizing our software on Intel architectures to provide flexible, elastic, high-performance Big Data deployments on-premises."
- Hortonworks Inc., commonly recognized as one of the top three commercial Hadoop distributors, is making a sort of Big Data investment itself through the acquisition of Onyara Inc. That company created and contributes heavily to the top-level Apache project NiFi, which "supports powerful and scalable directed graphs of data routing, transformation and system mediation logic."
"The acquisition will make it easy for customers to automate and secure data flows and to collect, conduct and curate real-time business insights and actions derived from data in motion," Hortonworks said on Tuesday. The acquisition resulted in a new Hortonworks offering called DataFlow.
DataFlow addresses analytics problems associated with "data in motion" stemming from what Hortonworks calls the Internet of Anything (IoAT). This data comes from sources such as sensors, machines, geo-location devices, social feeds, Web, clicks, server logs and so on, Hortonworks said. "While the majority of today's solutions are custom-built, loosely secured, difficult to manage and not integrated, Hortonworks DataFlow powered by Apache NiFi will simplify and accelerate the flow of data in motion into HDP for full fidelity analytics," the company said.
- Application infrastructure specialist Concurrent Inc. announced Driven 1.3, an updated offering designed to help monitor and manage Hadoop applications. "Driven offers enterprise users -- developers, operations and lines of business -- unprecedented visibility into applications written in Cascading, Scalding, Cascalog, Apache Hive and MapReduce," the company said in a statement Tuesday. "It provides deep operational insights, search, segmentation and visualizations for rapid troubleshooting and performance management."
Driven provides a scalable metadata repository that helps enterprises analyze relevant app metrics such as service-level agreements, key performance indicators and data lineage, the company said. Concurrent said the offering now offers a plug-in agent to work with Apache Hive and MapReduce jobs and tasks, along with improved collaboration and sharing capabilities.
- Impetus Technologies announced free versions of its StreamAnalytix platform, comprising open source components such as Apache Storm, Kafka and Hadoop. With out-of-the-box interfaces for Apache Cassandra, Apache Solr and Elasticsearch also available, StreamAnalytix embeds a complex event-processing engine for real-time analytics of streaming data, the company said.
StreamAnalytix, Impetus said, provides for rapid application development via a visual interface that lets coders leverage drag-and-drop operators, visually draw connections, configure messages and alerts, and view performance metrics, with the ability to save them for later analysis.
The free offering "is designed to continuously ingest massive volumes of data," Impetus says on its site. "The high-performance stream processing engine continuously queries, filters, correlates, integrates, enriches and analyzes data to discover exceptions, patterns and trends that are presented through live dashboards."