Study Links Flawed Online Tutorials with Vulnerable Open Source Software
German researchers have published a paper finding that developers do indeed copy and paste code directly into their open source software, which can lead to the introduction of security vulnerabilities if that code comes from flawed online tutorials.
While the study was limited in scope -- examining only PHP code for Web projects -- in order to prove a concept, it could point to a widespread problem in the open source community and perhaps even commercial software.
The concept being proven in the paper -- "Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery" -- is that tools can be developed to find flawed code patterns in online tutorials and use those patterns to discover large-scale security vulnerabilities in real-world code.
The researchers have posted the code for two such tools they developed to do just that on GitHub, and in the course of their research they found definite evidence of the link between buggy tutorials and actual projects.
"Thanks to our framework, we have uncovered over 100 vulnerabilities in Web application code that bear a strong resemblance to vulnerable code patterns found in popular tutorials," the paper said. "More alarmingly, we have confirmed that eight instances of a SQLi vulnerability present in different Web applications are an outcome of code copied from a single vulnerable tutorial. Our results indicate that there is a substantial, if not causal, link between insecure tutorials and Web application vulnerabilities."
Along the way, they basically confirmed that developers of real-world software do indeed copy code directly from other sources and paste it into projects that are then used by others. "Our results give credence to the widely known anecdote that programmers copy and paste code from vulnerable tutorials. Our case study, involving 64,415 PHP projects hosted on GitHub, indicates that such ad hoc code re-use may endanger the security of software throughout the open source landscape," they said.
The researchers admitted that because they only investigated PHP code, they can't prove coders using other languages are using the same copy/paste techniques. Also, their findings don't necessarily transfer to commercial development.
"We restrict our analysis to open source code, and thus, the possibility exists that the practice of copying from tutorials is particularly prevalent in the open source world and less common in closed source environments," they said. "Exploring these questions is left for future work."
The tools developed by the researchers -- GithubSpider and CADetector -- are on GitHub, though they contain no descriptions, related Web sites or topics.
David Ramel is an editor and writer for Converge360.