In-Depth
Speech specs
- By John K. Waters
- December 1, 2002
''Historically, speech has been complicated to implement largely because the
standards had not been developed to actually write speech applications,'' said
Sunil Soares, director of product management at IBM's Pervasive Computing
Division. ''Over the past three years, that has begun to change. You can think of
voice today as being where the Web was in 1994, when we had static Web pages and
PCs. We didn't know what to do with all of the technology and how to implement
it.''
The emergence of a new specification (Speech Application Language Tags or
SALT) and the maturation of an older one (VoiceXML), are beginning to provide a
sense of stability in the speech industry.
Voice Extensible Markup Language (VoiceXML) was written by the VoiceXML
Forum, which contributed it to the World Wide Web Consortium (W3C) standards
body. VoiceXML has been around for about two-and-a-half years now, and there are
more than 600 vendors and service providers who currently adhere to that
particular standard for development.
The W3C defines VoiceXML as a markup language ''designed for creating audio
dialogs that feature synthesized speech, digitized audio, recognition of spoken
and DTMF key input, recording of spoken input, telephony, and mixed-initiative
conversations. Its major goal is to bring the advantages of Web-based
development and content delivery to interactive voice response
applications.''
SpeechWorks was one of the earliest companies to embrace VoiceXML. The
company's flagship product line, OpenSpeech, is a speech-recognition solution
optimized for VoiceXML. ''We were the first company to introduce a line of
products built from the ground up to support VoiceXML,'' said Steve Chambers,
chief marketing officer at SpeechWorks. ''For us, it has been good because
everyone wants a standard. It delivers investment protection.''
Chambers expects to see most of the speech applications appearing in the near
term to be VoiceXML-based, primarily because the standard has been around for a
while. But another speech standard, SALT, has received a lot of support from
some very big players.
SALT was created by the SALT Forum, a group of technology companies working
together to accelerate the development of speech technologies in telephony and
so-called multimodal systems. The founding members of the group are SpeechWorks,
Intel, Cisco Systems, Philips, Comverse and Microsoft. Formed in October 2001,
the SALT Forum now claims more than 50 member organizations; it released the 1.0
version of SALT earlier this year.
According to James Mastan, director of marketing for Microsoft's .NET Speech
Technologies, the SALT spec defines a set of lightweight tags as extensions to
commonly used Web-based programming languages. ''The idea,'' Mastan said, ''was not
to reinvent the wheel, but to take advantage of the existing Web infrastructure
and standards, and to simply add some lightweight standards that allow
developers to add speech to their Web applications in an integrated
fashion.''
Basically, the SALT tags allow developers to add speech interfaces to Web
content and applications using familiar tools and techniques. In ''multimodal''
applications, the tags can be added to support speech input and output, either
as standalone events or jointly with other interface options, such as speaking
while pointing to the screen with a stylus, Mastan said. In telephony
applications, the tags provide a programming interface to manage the
speech-recognition and text-to-speech resources needed to conduct interactive
dialogs with the caller through a speech-only interface.
The SALT specification is designed to work equally well on traditional
computers, handheld devices, home electronics, telematics devices (such as
in-car navigation systems) and mobile phones.
''What's really going to matter here from an app development perspective is
the types of tools available to application developers to enable them to build
these multimodal applications,'' said Peter Gavalakis, marketing manager at
Intel, ''not the SALT tags in and of themselves. But you need some standard or at
least an open specification that an industry ecosystem can develop around.''
SALT-based offerings are already coming down the product pipeline. In May,
Microsoft announced the beta release of its .NET Speech SDK, a Web developer
tool that the Redmond software maker billed as the first product based on the
SALT spec. Philips is reportedly building a SALT-based browser and a telephony
platform for SALT. HeyAnita, a speech hosting company, is developing a
SALT-based browser for its hosted speech platform. Carnegie Mellon University is
developing an open-source SALT browser, which the university expects to be
available by the end of the year. Kirusa, a company that is heavily involved in
the multimodal application area, is focusing on building multimodal wireless
apps around SALT.
Microsoft's Mastan believes that both SALT and VoiceXML will be around for a
while, adding that there is some discussion among standards bodies about
convergence of the two in the future.
Microsoft's entrance into this market has received mixed reviews, but is
generally considered a good thing.
''Microsoft threw a monkey wrench in the gears with SALT,'' said Meta Group
analyst Earl Perkins. ''But it's had both a positive and negative affect. It drew
attention to a growing market, because Microsoft never enters a market unless
they realize there's money to be made. But on the other hand, they introduced
another standard, so there may be a bit of a delay while vendors sort out how
they're going to support both of them.''
See the following related stories:
Giving
applications a voice , by John K.
Waters
Talking speech tech , by John K. Waters
Multiple modes , by John K.
Waters
About the Author
John K. Waters is a freelance writer based in Silicon Valley. He can be reached
at [email protected].