News

Google Applies Text-Search to Open-Source Project Hosting

Last week Google launched an open-source project hosting service for professional software developers as part of its Google Code offerings. 

"Google is rooted in open source and we would like to make it easy for others to develop innovative open-source applications," says Greg Stein, the Google engineering manager who is spearheading the hosting site. Stein joined Google after a stint as director of engineering at CollabNet, where he managed the Subversion project. He is also the chairman of the Apache Software Foundation.

The hosting service is free but developers who participate in the open-source projects as project members or administrators must sign in using their Gmail accounts. Google acknowledges the development tools offered at launch are limited compared to other hosting sites such as SourceForge.net and Tigris.org. The Google hosting service offers project workspaces, basic membership controls, version control via Subversion, issue tracking and mailing lists via groups.google.com. It does not offer Web hosting and, according to information on the Google Code site, the service will not support shell accounts, build farm, private or nested projects.

Google is taking a unique approach in several areas: "One difference is the benefit of Subversion implementation backed by Google's Big Table, a massively scalable, highly available storage technology," says Stein. "Also, we were able to leverage Google's search technology to design a whole new approach at issue tracking." The issue tracking tool uses Google's free-text search technology to search issue metadata, which reduces complexity with fewer required fields and the ability to store information as issue labels. "We will also apply search to improve upon searching for relevant projects," he says. The issue tracking technology is proprietary, but other features of the hosting site are open source, such as the Subversion versioning tracking tool.

Google's decision to offer a hosting service benefits the open-source movement and it is another signal that open source has become mainstream, says Jay Seirmarco, general manager of SourceForge.net. "Over the past couple of years we see commercial interest in open source climbing and Google has now been added to a huge list. IBM, Oracle, Red Hat, Novell, CA and even Microsoft are on that list.

"In the simplest terms, there is now an additional scalable Subversion repository and a novel issue tracker that's available for open source projects," says Seirmarco. "Changes that are already underway will permit our projects to use these tools. We currently let our project administrators link to an external Subversion repository, or link to source code that is hosted by Google if they so choose." Google has publicly indicated that they are supportive of a "mix-and-match approach," he says.

"We also embrace that approach," says Seirmarco. "Our long term plan is to route project navigation. You'll have the ability to link to external offerings, such as a blog or a wiki."

Seirmarco says his team knew about Google's project hosting plans a few weeks in advance of the July 27th announcement and the organizations worked together to ensure the integrity of SourceForge.net's project name space. SourceForge.net hosts more than 120,000 projects. "If you are on code.google.com, and you want to put some source code into their Subversion repository, it asks you what you'd like to name that repository and if you give it a name of a project hosted on SourceForge.net, you're not permitted to do that and it actually puts you in touch with the project administrator," he explains. "We are looking to work on other collaboration with Google for the benefit of the open-source community."

A notable difference in Google's approach is the decision to support only seven open-source licenses, which the company says cover a range of license styles. "We are taking a position against license proliferation," says Stein. "Licensing proliferation increases the complexity of bringing together multiple software components. By reducing the number of licenses, the hope is to make it easier for developers and users to choose the software the meets their needs without needing to overly worry about how those licenses work together."

Google supports English-only projects for now, although the site is open to international developers in countries where Google is allowed to do business. Localization is planned when Google Code is more mature.

The international audience for open-source software is typically larger than its use in North America. About 78 percent of SourceForge.net's visitors are from outside of North America, says Seirmarco, who notes the site receives about 24 million unique visitors a month. Transnational corporations often develop their projects on the site to expose their work products to audiences that are difficult to reach. "We are working with IBM right now on an initiative that is focused on their WebSphere application server community edition. IBM is targeting the BRIC countries—Brazil, Russia, India and China—and we are facilitating that," he says.

The site is also a good way to monitor industry trends via download activity, feature requests, and developer participation in projects.

Unlike VA Corp., the parent company of SourceForge.net, Google does not plan to offer an enterprise edition with controlled access. "If you want to deploy it behind a firewall or avail yourself of things we simply don't need in the open-source world, like security or word-based access, then you go after the enterprise product," says Seirmarco. VA Corp. has had success selling the SourceForge enterprise solution to large financial institutions and companies like Federal Express.

About the Author

Kathleen Richards ([email protected]) is the editor of RedDevNews.com and executive editor of Visual Studio Magazine.