Report: IoT Improving Code Quality in Open Source Java Projects

Mountain View, Calif.-based software testing company Coverity has just released a new Scan report, this one focused on open-source big data projects and the impact of the Internet of Things (IoT) on the quality of those projects. In a nutshell, the report concludes that IoT and the tsunami of data that phenomenon is expected to generate over the next decade is actually having a positive affect on code quality. Among the largest big data projects in this Scan -- Apache Hadoop, Hbase and Cassandra -- quality has improved steadily, the report's authors found.

When Coverity published its 2013 report, the average defect density rate (which is derived from the number of errors per 1,000 lines of code) for all the projects was 1.99. This year that average dropped to 1.83. That change might not be headline-grabbing, but it shows stead improvement, said Zach Samocha, who directs the Scan service. Big data is fast becoming an integral part of both the enterprise and IoT devices. Those connections have bumped up the focus on software quality and security.

"When you think about big data and IoT, those things go hand-in-hand," Samocha told me. "And these open source projects are at the heart of big data. We were very happy to see that the open source community is stepping up."

The report analyzed software defects in a total of 16 Java-based projects using the company's open source scanning service. The title of the report is "Big Data Spotlight," but Hadoop shines the brightest here. Hadoop eliminated key defects, improving its defect density rate from 1.71 in 2013 1.67 this year. The Apache HBase project, which grew by almost 200,000 line of code since the last report, lowered its defect density rate from 2.33 to 2.22. Apache Cassandra reduced its rate from 1.95 to 1.61.

"We've seen a large increase in the number of concurrent data access violations, null pointer dereferences, and resource leaks that were eliminated," Samocha said. "Apache Hadoop is the foundation for all big data, and it was nice to see that it is doing so well. We chose to emphasize it, because we hope that other projects are going to follow its example."

Those issues—null pointer dereferences, resource leaks, and concurrent data access violations—were the three most commonly fixed issues among the projects in the report. And defect density improved overall. In fact, the projects fixed almost 50% of the resource leaks, "which is consistent with the level we see in C/C++ projects," the authors wrote.

However, only 13 percent of the Java resource leaks found in the 2013 Scan report had been addressed. "It could be that because open source big data projects serve as fundamental technology to so many organizations, the projects are willing to invest in addressing these types of issues versus the dodgy code and performance defects we saw in our 2013 report," the authors wrote.

This year's Scan report utilized, apparently for the first time, the Coverity Security Advisor component of its namesake development test platform/suite. This enhancement supported coverage of the Open Web Application Security Project's (OWASP) Top 10 and Common Weakness Enumeration (CWE) security vulnerabilities in Java apps. The open-source OWASP identifies 10 of the most critical web app security risks each year. The CWE is a community project sponsored by the Mitre Corporation to create a catalog of software security vulnerabilities. Using this service, the Coverity Scan found 95 OWASP Top 10 issues in the big data projects.

"Given the sensitivity of these types of issues, we will not disclose the names of the projects in which we found the defects," Samocha said. "But I will say that it's imperative that open source projects closely examine all OWASP Top 10 issues to avoid possible exploitation. Projects should also closely analyze their projects to assess which components contain security sensitive components, including authentication and authorization functions as well as cryptographic code. These components should be assessed by someone with security expertise to assess potential attack vectors."

The Coverity Scan service has an interesting history: It began as a public/private sector research project focused on finding defects in open source code back in 2006. The U.S. Department of Homeland Security launched that project, but the company now manages it. Coverity continues to provide development testing tech as a free service to open sourcers. Currently, nearly 4,000 projects participate in the Coverity Scan service, the company says. Those participants scan or test code in several languages, including Java, C#, C, and C++. More than 100,000 defects identified by the Coverity Scan service have been fixed since the inception of the program, the company says.

The full Coverity Scan Report: Spotlight on Big Data can be downloaded from the company's Web site here.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].