In-Depth

Data on Demand

The Big Idea

Data on Demand

WORLD WIDE DATABASE

  • “If HTML and the Web made all the online documents look like one huge book, [the Semantic Web] will make all the data in the world look like one huge database,” says Web visionary Tim Berners-Lee.
  • The Semantic Web uses models to automate data mapping and reduce the manual labor associated with configuring data.
  • Although the number of commercial and open-source tools designed to ease implementation is growing, only experts have the skill to use them fully.

When Vodafone Group launched its Vodafone Live portal in 2002, mobile phone users tapped into a brave new world of products and services. Their mobile handsets displayed the latest news and sports headlines, downloaded custom ring tones and video games, and became a tool for purchasing MP3s, a la Apple's iTunes electronic store. Eventually, more than 35 million people across Europe, Asia, Australia and New Zealand gravitated to the site.

Unfortunately, the portal threatened to be an embarrassment of riches. The range of choices sometimes overwhelmed the navigational prowess of tiny handset screens. Vodafone feared tedious scrolling through menus to find the right music, games or ring tones would become a turnoff. "The more people have to click, especially on a mobile handset, the more drop-off [in business] you'll see," says Daniel Appelquist, senior technology strategist at Vodafone's European headquarters.

So, in 2004, the company beefed up search services by incorporating elements of still arcane Semantic Web technology. Vodafone reasoned that if Semantic Web methods brought more precision to customer searches, consumers would use Vodafone Live more often and in turn spend more money for downloaded content.

Eight keys to Semantic Web success

Semantic Web pioneers and tools vendors list what they say are emerging best practices essential for deriving value from this young technology.

  1. Be brave. Daniel Appelquist, Vodafone’s senior technology strategist at its European headquarters, warns the glut of information about the Semantic Web and Resource Development Framework (RDF) and Web Ontology Language (OWL) standards can make them sound more complicated than they actually are.

    “The Semantic Web can be used to solve some simple problems without having to earn a PhD in formal logic theory,” he says. “At its core, the Semantic Web isn’t that difficult to understand.”

  2. Take advantage of pre-developed schemas and vocabularies. “Look for applications people are building on top of the Semantic Web and using in the real world,” Appelquist suggests. Examples include the publishing industry’s Prism and Dublin Core.

  3. Adjust your thinking. The biggest challenge in using the Semantic Web as a data integration tool in enterprises is getting people to think in more abstract ways, says Colin Britton, CTO for tool vendor Metatomix. “Many people are used to thinking in a concrete way, using Java object models and traditional XML integration methods,” he says.

    His advice: seek people with backgrounds in AI or data analytics. “They completely get it straight away,” Britton says. “There are a lot of practitioners out there doing other things who have strong backgrounds in [the Semantic Web way] of thinking.”

  4. Commit for the long haul. “The first thing you have to accept is that managing metadata will be a never-ending task,” says Patrick Stickler, senior architect for Forum Nokia. “This is not a one-time, one-off project.” Enterprises need to establish an infrastructure that supports the constant management of the metadata. “Metadata is your most important asset,” he says. “All of your value and benefits are facilitated by how up-to-date, how rich, and how precise your metadata is. So you have to be prepared to have the infrastructure—both tools and individuals—in place to keep that metadata fresh and to continually evolve your metadata to be in sync with your business needs.”

  5. Remember that precision is power. “When you are defining your ontologies and your metadata, be as precise as possible about data types,” Stickler warns. “You need to be explicit about what a particular value needs to be. If it’s a date, say that it is a date in your ontology. One of the benefits that RDF and OWL provide over and above relational models is that you can be much more precise in your data typing.”

  6. Don’t be constrained by relational data thinking. A common strategy in relational database management is to try to address all the issues you may encounter. “That’s because changing a relational schema is a data migration project involving major retooling,” Stickler says. “The Semantic Web doesn’t force you to do that. It is a very flexible, evolutionary methodology for defining metadata and managing your knowledge. That means you can start small by describing only those types of resources that you need to deal with today. As you need to describe new kinds of resources, you can extend your ontologies with zero impact to your current knowledgebase and tools. That is a major savings over traditional relational-based solutions for dealing with knowledge and metadata management.”

  7. Pick your battles. “Not every problem has something to do with Semantic Web technology,” points out David Kershaw, manager of professional and educational services at tool vendor Altova. “Creating your own ontology to mark up Web pages? I say that’s costly. But marking up a Web service that’s very important to the enterprise, that’s a highvalue activity that’s worth exploring.”

  8. Budget for training. “Companies need to be aware of the investment that’s going to have to take place for training employees,” says Jeff Pollock, VP of technology at tool vendor Cerebra, and author of Adaptive Information: Improving Business Through Semantic Interoperability, Grid Computing, and Enterprise Integration. The costs aren’t exorbitant, he adds, “but like any technology solution, there is no magic. People have to be trained.”

The company quickly realized it was on the right track. Two months after introducing leading-edge search capabilities, Vodafone saw a 50-percent drop in the number of page views site visitors logged before downloading content. "They were able to find the song they wanted much more easily," Appelquist says. Indeed, fewer page views did translate into higher revenue, which rose 20 percent, he adds.

Partial deployments
Vodafone's silver bullet is a data modeling standard called Resource Description Framework (RDF) from the World Wide Web Consortium, the leading Web standards body. RDF is one tool in the growing Semantic Web arsenal that uses standards-based metadata models to ease data integration among various sources such as relational databases, unstructured text and Web content.

"If HTML and the Web made all the online documents look like one huge book, [the Semantic Web] will make all the data in the world look like one huge database," Web visionary Tim Berners-Lee has said. Berners-Lee is credited with conceptualizing the Semantic Web. So far, that dream has been slow to develop, even among pioneering enterprises, such as Vodafone, that have found success by weaving pieces of the Semantic Web into tightly focused apps. "We've taken bits of [Semantic Web technology] that we think are applicable and use them to solve some specific problems," Appelquist says.

Nokia, another European-based mobile phone giant, is similarly cautious. "We've taken an evolutionary approach to deploying Semantic Web technologies," says Patrick Stickler, senior architect for Forum Nokia, a portal for developers. "We've looked at what parts of our service offering can best benefit from the Semantic Web, rather than using it to completely re-tool our entire infrastructure."

Nevertheless, the small steps being taken by these two companies are helping to establish implementation best practices that may address the Semantic Web's bigger challenge: widespread enterprise adoption. Vodafone and Nokia agree Semantic Web isn't vaporware, but using it successfully requires some heavy lifting, they say.

Integration with a network twist
Semantic Web standards offer a long list of benefits, at least on paper. They promise an end to complicated, hard-wired mappings among data sources that can become inflexible integration nightmares. The Semantic Web uses models to automate data mapping and to reduce the manual labor associated with configuring data. To do this, the semantic standards use metadata and inference rules that establish relationships among various data elements. In addition to RDF, the Web Ontology Language, also a product of the W3C, is a key Semantic Web standard.

"From an enterprise perspective, a lot of what we are doing looks a lot like enterprise data integration, but with a network twist," says Eric Miller, Semantic Web activity lead at the W3C. "This is about putting in place a set of data integration standards woven into the very fabric of the Web and allowing them to scale at a variety of levels, ranging from the files and data I create on my desktop PC to departments, the enterprise and partners, all the way up to Web."

As promising as that seems, enterprises haven't jumped on the Semantic Web bandwagon yet. "There's no doubt that the semantic integration stuff is solving a real problem, the question is, at what point are companies compelled to solve that problem?" wonders Ronald Schmelzer, senior analyst with technology researcher ZapThink. "I'm not getting the sense it's being used much at all in enterprises."

As always, timing matters
The Semantic Web is not likely to be at the top of to-do lists at enterprises that are focused on launching service-oriented architectures, which many observers consider a foundation for the Semantic Web. Schmelzer expects it will take another 3 to 5 years for SOA to become mainstream, and about 5 years after that before semantic technology is ubiquitous.

Semantic Web elements such as modeling data and creating ontologies require special skills and a degree of technical sophistication to do correctly. Although a growing number of commercial and opensource tools designed to ease implementation is available, they remain "very expert centric," Stickler says. "We won't see a critical mass [for the Semantic Web] until the tools are usable by common people who don't need to understand what's under the hood."

Adds Vodafone's Appelquist: "Alot of the tools have come out of the academic community. Some are appropriate for the commercial sector, others are more research oriented. We faced a learning curve trying to understand which were right for us." There's an extensive list of tools at www.daml.org/tools.

Nevertheless, Semantic Web technology isn't entirely pie in the sky. Some organizations are launching pioneering efforts with tightly controlled rollouts that, while not transforming the enterprise, are delivering concrete benefits.

Automated Web content creation
Forum Nokia, a showcase for development tools, SDKs, technical information and support documentation, currently counts more than 2 million registered users. Devery Johnson, a Nokia solution manager, calls Semantic Web technology "the backbone of the forum's infrastructure."

Although Johnson declines to cite specifics, he says Semantic Web technology has made possible process realignments that resulted in significant cost savings. "Those savings came about from our ability to distribute the publication process and reduce bottlenecks that had occurred around editors," he adds.

Nokia began using Semantic Web technology 2 years ago to define metadata for the forum's developer resources. Because Nokia can quickly create or refashion views into the central metadata repository, it can automatically generate Web content to match demand and provide a search mechanism that is more precise than standard keyword queries.

Scheming to classify data
Semantic Web integration required more than just flipping a switch, however. First, developers immersed themselves in the world of ontology, a core building block of the Semantic Web. An ontology is a schema that organizes a hierarchy of rules that allow for inferences about how certain facts relate to each other.

The set of rules, or taxonomy, classifies data in the same way a scientific taxonomy classifies living organisms into a kingdom, phylum, class and order. In Nokia's ontology, any document that refers to the API JSR 135, for example, pegs that document as something in the Java world. "So the person describing the document doesn't actually have to say, 'Oh, this is also about Java and it's also a document intended for a technical audience,'" Nokia's Stickler explains.

Developers only need to specify that the subject is JSR 135 for all of the subsequent inferences to be made. "We codify our ontology in a format that can be loaded in, read, processed and understood by Semantic Web tools," Stickler adds. "Our primary Semantic Web server is where we publish and store all of these schemas--the actual files that express or encode our ontologies."

So, rather than burdening content creators with writing tedious descriptions about each resource, developers provide minimal information, and the ontology fills in the details. It's a heavy responsibility for those who define the ontology in the first place, a job Stickler calls "the most crucial task" in Semantic Web success. "So much is driven from the metadata, you don't want to do it haphazardly, and once it's created, you don't want to go in and change your ontology unless there's an explicit business requirement to do so."

Stickler says he's also devoted long hours over the last 2 years to retagging metadata associated with each resource on the site. Retagging, the process of describing the document in a way meaningful to Semantic Web methods, involves firing up a metadata editor tool that opens a Web form that displays properties relevant to the document. "This is all driven by the ontology," Stickler says. "As we add new properties or support new kinds of resources, the metadata editor automatically and dynamically updates to reflect the current state of our ontology."

To aid this process, Nokia created an open-source Semantic Web toolkit known as Wilbur, which it makes available at http://wilbur-rdf.sourceforge.net. All of this work is paying off for Forum Nokia users, Stickler believes. "Our users see a significant benefit from our metadata- based search," he says. "They are searching not just on textual content but on the rich metadata defined by subject matter experts, so the precision of the search results has seen a dramatic improvement. They find the resources they need quickly and with fewer clicks."

Persuading content providers to invest
Vodafone uses its RDF to integrate its Vodafone Live Web site with the thirdparty providers that create content for the portal. Providers create content using a custom XMLformat, developed by Vodafone, that uses RDF descriptions. "It's an RDF schema that we've developed with Semantic Web technology," Appelquist says.

In addition to helping consumers more quickly find the content they're looking for, the descriptions label products for age appropriateness, so, for example, young children do not download teen-rated video games.

RDF provides a framework for developing search and labeling vocabularies as well as a means to embed the labels into XMLdocuments. Vodafone hasn't completely replaced its custom XML with RDF, however. "We're not using RDF to transmit formatted text, for instance," Appelquist says. "If you have an article that needs to be marked up in a certain way, it's not appropriate for that."

A challenge for Vodafone was refashioning its thinking around a metadatacentric view of how content flows through its organization.

"In a traditional application development model, you often approach development from an object-oriented view, using a UML [Unified Modeling Language] model to develop an overall architecture," Appelquist explains. "In a content-centric approach, you look at which parts of your system are using metadata and which parts are generating metadata. This is not very well represented within UMLconstructs, whereas if you are looking at things from a metadata perspective, you need to take these factors into account from the beginning of your application design."

Vodafone also had to coax its content providers into investing in RDF training to provide the phone company with appropriately labeled content. "There was some initial selling that we had to do to convince them to incur these costs," Appelquist says. The quick financial benefits helped with the arm twisting. "It became much easier because we could point to real increases in usage and increases in revenue that were occurring because of this approach," he says.

ILLUSTRATION BY CANDACE COULAS