Google Drills Into Open Source Code

Google’s data acquisitiveness may be keeping privacy advocates up at night but it’s also yielding an unexpected bounty: telling stats about open source code.

The company acquired these statistics during the natural course of its regular crawling activities, said Chris DiBona, Google’s open source programs manager. Among other things, Google’s Code Search Crawl revealed that nearly half of the projects licensed under the GNU General Public License (GPL) have moved to version 3 of that license, including new projects.

"We all thought the GPL v3 was going to matter but this is amazing,” DiBona told attendees at the recent O’Reilly Open Source Convention (OSCON). "People tend to forget just how much open source there is.”

The Google Code Search crawl found that there are at least 2.5 billion lines of open source code extant in 30 million unique files, DiBona said. "This isn’t to say that this is all the open source code,” he said. "It’s a minimum.”

The crawl also revealed that there are two lines of C code in OSS programs for every line of C++; that there’s more PHP than Perl by 37 million lines of code; that there are three times as many Smalltalk programs as Objective C. IDC analyst Al Hilwa said the statistics are interesting but not necessarily surprising. "It suggests there’s a lot of legacy code in open source software,” Hilwa said.

DiBona believes that enterprise developers are not taking advantage of all that code. "If you get good at incorporating open source in your projects -- you’re equal to all of you plus all the developers at Google,” he told OSCON attendees. "Because you can have all this code and operate from it."

In an interview following his keynote, DiBona characterized the Google code search results as a side effect of something the company does as an "inherent” part maintaining the quality of its regular index. DiBona said that about half of the 225,000 projects hosted on Google Code itself are GPL-licensed. About half of those have moved to GPLv3, he said.

"That suggests the age of the licenses of the new projects -- there are a lot more new projects and clearly they are coming with the latest version of the [GPL] license,” Hilwa said.

"This is all about empowering developers,” DiBona said. "You can take baby steps by looking at the permissive licenses like Apache and BSD. Then when you’re more comfortable, you’ll realize the power of being able to pull that much code into your projects, and then look into the LGPL, then the GPL. That’s a measured way of approaching it, and it’s worth the effort, because in the end you get access to hundreds of millions of lines of code that’s there for you to use.”


GWT Outreach Effort

Google has been reaching out to developers in recent years with a growing number of initiatives and tools, many of which were also side effects of Google’s internal activities, DiBona said. The Google Web Toolkit (GWT), for example, has "really resonated with behind-the-firewall, client-server type environments,” he said.

The GWT is an open-source development framework aimed at Web application builders who want to use AJAX without having to learn JavaScript. Developers use the toolkit to write their apps using Java, and then the GWT compiles that code into optimized JavaScript.

Google also expanded its App Engine Web-hosting platform earlier this year from Python-only support to include Java, which Gartner analyst Ray Valdes sees as a definite nod to the enterprise. "It taps into a larger development population and moves closer to enterprise needs,” he said. He adds that Google has been using Java internally for a long time: both the Android OS and the Google Web Toolkit are Java-based.

"At the end of the day, we’re also trying to understand open source,” Valdes said.

DiBona was sanguine about Google’s recent announcement that the search giant is developing an operating system, called Chrome OS.  "I take a traditionalist view of operating systems,” he said. "I think of an OS as something that runs a computer. From that perspective, if you look at our production network, we’ve been working with operating systems for ten years. [The Android mobile phone OS] was our first operating system to go to consumers. Chrome OS will be our second.” Google’s Code Search results are available from Google Labs.


Doing The Wave

Meanwhile a few miles north at the Googleplex, the search engine giant hosted a "Federation Day” for 150 developers who showed up at the Mountain View facility to learn about how to contribute to the Google Wave Federation Protocol a new open source project that is developing the underlying network protocol for sharing Waves among Wave providers.

Unveiled in May, Google Wave is a real-time, shared communications platform. Waves are hosted "conversations” that could include text, photos, videos, and maps. All participants have access to the wave communication stream.  The event focused on getting developers up to speed on how best to contribute to the Protocol project.

During the event, Google open sourced two hunks of Wave code: the Operational Transform (OT) code, which the company calls "the heart and soul of the collaborative…Wave experience,” and a basic client/server prototype that uses the Wave Protocol. The prototype is a simple "Hello World” designed to encourage experimentation with the Wave Protocol, but the Google plans to shepherd that OT code into a production-quality reference implementation. Together, these components comprise nearly 40,000 lines of Java code. Both are available now under the Apache 2.0 license.

Reporting contributed by Jeffrey Schwartz

Reader Comments:

Add Your Comment Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above