In-Depth
Data on Demand
- By Alan Joch
- January 1, 2006
The Big Idea
WORLD WIDE DATABASE
- “If HTML and the Web made all the online documents look like one huge book, [the Semantic Web] will make all the data in the world look like one huge database,” says Web visionary Tim Berners-Lee.
- The Semantic Web uses models to automate data mapping and reduce the manual labor associated with configuring data.
- Although the number of commercial and open-source tools designed to ease implementation is growing, only experts have the skill to use them fully.
When Vodafone Group launched
its Vodafone Live portal in 2002,
mobile phone users tapped into
a brave new world of products
and services. Their mobile handsets displayed the
latest news and sports headlines, downloaded custom
ring tones and video games, and became a tool
for purchasing MP3s, a la Apple's iTunes electronic
store. Eventually, more than 35 million people
across Europe, Asia, Australia and New Zealand
gravitated to the site.
Unfortunately, the portal threatened to be an embarrassment
of riches. The range of choices sometimes
overwhelmed the navigational prowess of tiny
handset screens. Vodafone feared tedious scrolling
through menus to find the right music, games or ring
tones would become a turnoff. "The more people
have to click, especially on a mobile handset, the
more drop-off [in business] you'll see," says Daniel
Appelquist, senior technology strategist at Vodafone's
European headquarters.
So, in 2004, the company beefed up search services
by incorporating elements of still arcane Semantic Web technology. Vodafone reasoned that if Semantic
Web methods brought more precision to customer
searches, consumers would use Vodafone Live
more often and in turn spend more money for downloaded
content.
Eight keys to Semantic Web success
Semantic Web pioneers and tools vendors list what they say are emerging best practices essential for deriving value from this young technology.
Be brave. Daniel Appelquist, Vodafone’s senior technology strategist at its European headquarters, warns the glut of information about the Semantic Web and Resource Development Framework (RDF) and Web Ontology Language (OWL) standards can make them sound more complicated than they actually are.
“The Semantic Web can be used to solve some simple problems without having to earn a PhD in formal logic theory,” he says. “At its core, the Semantic Web isn’t that difficult to understand.”
Take advantage of pre-developed schemas and vocabularies. “Look for applications people are building on top of the Semantic Web and using in the real world,” Appelquist suggests. Examples include the publishing industry’s Prism and Dublin Core.
Adjust your thinking. The biggest challenge in using the Semantic Web as a data integration tool in enterprises is getting people to think in more abstract ways, says Colin Britton, CTO for tool vendor Metatomix. “Many people are used to thinking in a concrete way, using Java object models and traditional XML integration methods,” he says.
His advice: seek people with backgrounds in AI or data analytics. “They completely get it straight away,” Britton says. “There are a lot of practitioners out there doing other things who have strong backgrounds in [the Semantic Web way] of thinking.”
Commit for the long haul. “The first thing you have to accept is that managing metadata will be a never-ending task,” says Patrick Stickler, senior architect for Forum Nokia. “This is not a one-time, one-off project.” Enterprises need to establish an infrastructure that supports the constant management of the metadata. “Metadata is your most important asset,” he says. “All of your value and benefits are facilitated by how up-to-date, how rich, and how precise your metadata is. So you have to be prepared to have the infrastructure—both tools and individuals—in place to keep that metadata fresh and to continually evolve your metadata to be in sync with your business needs.”
Remember that precision is power. “When you are defining your ontologies and your metadata, be as precise as possible about data types,” Stickler warns. “You need to be explicit about what a particular value needs to be. If it’s a date, say that it is a date in your ontology. One of the benefits that RDF and OWL provide over and above relational models is that you can be much more precise in your data typing.”
Don’t be constrained by relational data thinking. A common strategy in relational database management is to try to address all the issues you may encounter. “That’s because changing a relational schema is a data migration project involving major retooling,” Stickler says. “The Semantic Web doesn’t force you to do that. It is a very flexible, evolutionary methodology for defining metadata and managing your knowledge. That means you can start small by describing only those types of resources that you need to deal with today. As you need to describe new kinds of resources, you can extend your ontologies with zero impact to your current knowledgebase and tools. That is a major savings over traditional relational-based solutions for dealing with knowledge and metadata management.”
Pick your battles. “Not every problem has something to do with Semantic Web technology,” points out David Kershaw, manager of professional and educational services at tool vendor Altova. “Creating your own ontology to mark up Web pages? I say that’s costly. But marking up a Web service that’s very important to the enterprise, that’s a highvalue activity that’s worth exploring.”
Budget for training. “Companies need to be aware of the investment that’s going to have to take place for training employees,” says Jeff Pollock, VP of technology at tool vendor Cerebra, and author of Adaptive Information: Improving Business Through Semantic Interoperability, Grid Computing, and Enterprise Integration. The costs aren’t exorbitant, he adds, “but like any technology solution, there is no magic. People have to be trained.”
The company quickly realized it was on the right
track. Two months after introducing leading-edge
search capabilities, Vodafone saw a 50-percent drop
in the number of page views site visitors logged before
downloading content. "They were able to find the song
they wanted much more easily," Appelquist says. Indeed,
fewer page views did translate into higher revenue,
which rose 20 percent, he adds.
Partial deployments
Vodafone's silver bullet is a data modeling standard
called Resource Description Framework (RDF) from
the World Wide Web Consortium, the leading Web
standards body. RDF is one tool in the growing Semantic
Web arsenal that uses standards-based metadata
models to ease data integration among various
sources such as relational databases, unstructured text
and Web content.
"If HTML and the Web made all the online documents
look like one huge book, [the Semantic Web]
will make all the data in the world look like one huge
database," Web visionary Tim Berners-Lee has said.
Berners-Lee is credited with conceptualizing the Semantic
Web. So far, that dream has been slow to develop,
even among pioneering enterprises, such as Vodafone,
that have found success by weaving pieces of the
Semantic Web into tightly focused apps. "We've taken
bits of [Semantic Web technology] that we think are applicable and use them to solve some
specific problems," Appelquist says.
Nokia, another European-based mobile
phone giant, is similarly cautious. "We've taken an evolutionary approach to
deploying Semantic Web technologies,"
says Patrick Stickler, senior architect for
Forum Nokia, a portal for developers.
"We've looked at what parts of our service
offering can best benefit from the Semantic
Web, rather than using it to completely
re-tool our entire infrastructure."
Nevertheless, the small steps being taken
by these two companies are helping to
establish implementation best practices that may address the Semantic Web's bigger
challenge: widespread enterprise
adoption. Vodafone and Nokia agree Semantic
Web isn't vaporware, but using it
successfully requires some heavy lifting,
they say.
Integration with a network twist
Semantic Web standards offer a long list
of benefits, at least on paper. They promise
an end to complicated, hard-wired
mappings among data sources that can become
inflexible integration nightmares.
The Semantic Web uses models to automate
data mapping and to reduce the manual
labor associated with configuring data.
To do this, the semantic standards use
metadata and inference rules that establish
relationships among various data elements.
In addition to RDF, the Web Ontology
Language, also a product of the
W3C, is a key Semantic Web standard.
"From an enterprise perspective, a lot
of what we are doing looks a lot like enterprise
data integration, but with a network
twist," says Eric Miller, Semantic
Web activity lead at the W3C. "This is
about putting in place a set of data integration
standards woven into the very
fabric of the Web and allowing them to
scale at a variety of levels, ranging from
the files and data I create on my desktop
PC to departments, the enterprise
and partners, all the way up to Web."
As promising as that seems, enterprises
haven't jumped on the Semantic Web
bandwagon yet. "There's no doubt that the
semantic integration stuff is solving a real
problem, the question is, at what point are
companies compelled to solve that problem?"
wonders Ronald Schmelzer, senior
analyst with technology researcher ZapThink. "I'm not getting the sense it's being
used much at all in enterprises."
As always, timing matters
The Semantic Web is not likely to be at
the top of to-do lists at enterprises that
are focused on launching service-oriented
architectures, which many observers
consider a foundation for the Semantic
Web. Schmelzer expects it will take another
3 to 5 years for SOA to become
mainstream, and about 5 years after that
before semantic technology is ubiquitous.
Semantic Web elements such as modeling
data and creating ontologies require
special skills and a degree of technical sophistication
to do correctly. Although a
growing number of commercial and opensource
tools designed to ease implementation
is available, they remain "very
expert centric," Stickler says. "We won't
see a critical mass [for the Semantic
Web] until the tools are usable by common
people who don't need to understand
what's under the hood."
Adds Vodafone's Appelquist: "Alot of
the tools have come out of the academic
community. Some are appropriate for
the commercial sector, others are more
research oriented. We faced a learning
curve trying to understand which were
right for us." There's an extensive list of
tools at www.daml.org/tools.
Nevertheless, Semantic Web technology
isn't entirely pie in the sky. Some organizations
are launching pioneering efforts
with tightly controlled rollouts
that, while not transforming the enterprise,
are delivering concrete benefits.
Automated Web content creation
Forum Nokia, a showcase for development
tools, SDKs, technical information
and support documentation,
currently counts more than 2 million
registered users. Devery Johnson, a
Nokia solution manager, calls Semantic
Web technology "the backbone of the
forum's infrastructure."
Although Johnson declines to cite
specifics, he says Semantic Web technology
has made possible process realignments
that resulted in significant cost
savings. "Those savings came about from
our ability to distribute the publication
process and reduce bottlenecks that had
occurred around editors," he adds.
Nokia began using Semantic Web
technology 2 years ago to define metadata
for the forum's developer resources.
Because Nokia can quickly create or refashion
views into the central metadata
repository, it can automatically generate
Web content to match demand and provide
a search mechanism that is more
precise than standard keyword queries.
Scheming to classify data
Semantic Web integration required more
than just flipping a switch, however.
First, developers immersed themselves
in the world of ontology, a core building
block of the Semantic Web. An ontology
is a schema that organizes a hierarchy of
rules that allow for inferences about how
certain facts relate to each other.
The set of rules, or taxonomy, classifies
data in the same way a scientific taxonomy
classifies living organisms into a
kingdom, phylum, class and order. In
Nokia's ontology, any document that
refers to the API JSR 135, for example,
pegs that document as something in the
Java world. "So the person describing the
document doesn't actually have to say,
'Oh, this is also about Java and it's also a
document intended for a technical audience,'"
Nokia's Stickler explains.
Developers only need to specify that
the subject is JSR 135 for all of the subsequent
inferences to be made. "We codify
our ontology in a format that can be
loaded in, read, processed and understood
by Semantic Web tools," Stickler
adds. "Our primary Semantic Web server
is where we publish and store all of
these schemas--the actual files that express
or encode our ontologies."
So, rather than burdening content creators
with writing tedious descriptions
about each resource, developers provide
minimal information, and the ontology
fills in the details. It's a heavy responsibility
for those who define the ontology
in the first place, a job Stickler calls "the
most crucial task" in Semantic Web success.
"So much is driven from the metadata,
you don't want to do it haphazardly,
and once it's created, you don't want to go
in and change your ontology unless there's
an explicit business requirement to do so."
Stickler says he's also devoted long
hours over the last 2 years to retagging
metadata associated with each resource
on the site. Retagging, the process of describing
the document in a way meaningful
to Semantic Web methods, involves
firing up a metadata editor tool that opens
a Web form that displays properties relevant
to the document. "This is all driven
by the ontology," Stickler says. "As we add
new properties or support new kinds of
resources, the metadata editor automatically
and dynamically updates to reflect
the current state of our ontology."
To aid this process, Nokia created an
open-source Semantic Web toolkit known
as Wilbur, which it makes available at
http://wilbur-rdf.sourceforge.net.
All of this work is paying off for Forum
Nokia users, Stickler believes. "Our users
see a significant benefit from our metadata-
based search," he says. "They are
searching not just on textual content but
on the rich metadata defined by subject
matter experts, so the precision of the
search results has seen a dramatic improvement.
They find the resources they
need quickly and with fewer clicks."
Persuading content
providers to invest
Vodafone uses its RDF to integrate its
Vodafone Live Web site with the thirdparty
providers that create content for
the portal. Providers create content using
a custom XMLformat, developed by
Vodafone, that uses RDF descriptions. "It's an RDF schema that we've developed
with Semantic Web technology,"
Appelquist says.
In addition to helping consumers
more quickly find the content they're
looking for, the descriptions label products
for age appropriateness, so, for example,
young children do not download
teen-rated video games.
RDF provides a framework for developing
search and labeling vocabularies as
well as a means to embed the labels into
XMLdocuments. Vodafone hasn't completely
replaced its custom XML with
RDF, however. "We're not using RDF to
transmit formatted text, for instance,"
Appelquist says. "If you have an article
that needs to be marked up in a certain
way, it's not appropriate for that."
A challenge for Vodafone was refashioning
its thinking around a metadatacentric
view of how content flows through
its organization.
"In a traditional application development
model, you often approach development
from an object-oriented view,
using a UML [Unified Modeling Language]
model to develop an overall architecture,"
Appelquist explains. "In a
content-centric approach, you look at
which parts of your system are using
metadata and which parts are generating
metadata. This is not very well represented
within UMLconstructs, whereas
if you are looking at things from a metadata
perspective, you need to take these
factors into account from the beginning
of your application design."
Vodafone also had to coax its content
providers into investing in RDF training
to provide the phone company with appropriately
labeled content. "There was
some initial selling that we had to do to
convince them to incur these costs," Appelquist
says. The quick financial benefits
helped with the arm twisting. "It became
much easier because we could
point to real increases in usage and increases
in revenue that were occurring
because of this approach," he says.
ILLUSTRATION BY CANDACE COULAS