SALT telephony products emerge
- By John K. Waters
The SALT Forum, a group of technology companies working together to
accelerate the development of speech technologies in telephony and so-called
multimodal systems, has released the 1.0 version of its Speech Application
Language Tags (SALT) specification.
According to James Mastan, director of marketing for founding Forum member
Microsoft's .NET Speech Technologies, the SALT spec defines a set of lightweight
tags as extensions to commonly used Web-based programming languages. ''The
idea,'' Mastan told eADT, ''was not to reinvent the wheel, but to take
advantage of the existing Web infrastructure and standards, and simply add some
lightweight standards that allow developers to add speech to their Web
applications in an integrated fashion.''
Basically, the SALT tags allow developers to add speech interfaces to
Web content and applications using familiar tools and techniques. In
''multimodal'' applications, the tags can be added to support speech input and
output, either as stand-alone events or jointly with other interface options,
such as speaking while pointing to the screen with a stylus, Mastan said. In
telephony applications, the tags provide a programming interface to manage the
speech recognition and text-to-speech resources needed to conduct interactive
dialogs with the caller through a speech-only interface.
Version 1.0 of the SALT specification covers three broad capabilities: speech
output, speech input and call control. The specification's ''prompt'' tag allows
SALT-based applications to play audio and synthetic speech directly, while
'listen' and 'bind' tags provide speech recognition capabilities by collecting
and processing spoken user input. In addition, the specification's call control
object can be used to provide SALT-based applications with the ability to place,
answer, transfer and disconnect calls, along with advanced capabilities such as
conferencing. The SALT specification draws on emerging W3C standards such as
Speech Synthesis Markup Language (SSML), Speech Recognition Grammar
Specification (SRGS) and semantic interpretation for speech recognition to
provide additional application control.
The SALT specification is designed to work equally well on traditional
computers, handheld devices, home electronics, telematics devices (such as
in-car navigation systems) and mobile phones.
''What's really going to matter here from an app development perspective is
the types of tools available to application developers for building these
multimodal applications,'' Peter Gavalakis, marketing manager at Intel, told
eADT. ''Not the SALT tags in and of themselves. But you need some
standard or at least an open specification that an industry ecosystem can
Intel, another founding member, joined with Microsoft, Cisco Systems,
Philips, Comverse and SpeechWorks to form SALT last October. More than 50
organizations, including Verizon Wireless, have joined since. The addition of a
strong carrier like Verizon provides a ''significant boost to making speech
mainstream in cell phone applications,'' Mastan said.
SALT-based offerings are already coming down the product pipeline. In May,
Microsoft unveiled the beta release of .NET Speech SDK, a Web development tool
the Redmond software maker billed as the first SALT-compliant product. Philips
is reportedly building a SALT-based browser and telephony platform. HeyAnita, a
speech hosting company, is developing a SALT-based browser for its hosted speech
platform. Carnegie Mellon University is developing an open-source SALT browser,
which the university expects will ship by year end. And Kirusa is building
SALT-based multimodal wireless apps.
Rob Kassel, product manager for emerging technologies at SpeechWorks, said
his company was focusing on the multimodal space before SALT, working with
companies that were building their applications using ad hoc techniques. The
company's flagship OpenSpeech offering is described as a speech-recognition
solution optimized for VoiceXML, another emerging speech standard.
VoiceXML, a markup language used to describe an interaction between a caller
on a telephone and a server, uses XML tags to describe the call flow. It was
written by the VoiceXML Forum, and then sent off to the World Wide Web
Consortium (W3C). It was created, Kassel explained, to standardize the
development of ''speech-in-speech-out'' telephony applications, and was
specifically designed for Interactive Voice Response (IVR) applications, without
a consideration of multimodal technologies or the Web.
''SALT takes a different approach,'' Kassel said, ''in that it presumes a
Web-based environment that will encompass both the speech-in-speech-out and the
multimodal aspects of speech-enabled applications.''
Microsoft's Mastan believes that both standards will probably be around for a
while, and he said that there is some discussion among standards bodies about
convergence of the two in the future.
The SALT specification is currently being submitted to an as yet unrevealed
standards body. The SALT 1.0 specification is royalty free, available for
download from the Forum's Web site at www.saltforum.org.
John K. Waters is a freelance writer based in Silicon Valley. He can be reached