News

SALT telephony products emerge

The SALT Forum, a group of technology companies working together to accelerate the development of speech technologies in telephony and so-called multimodal systems, has released the 1.0 version of its Speech Application Language Tags (SALT) specification.

According to James Mastan, director of marketing for founding Forum member Microsoft's .NET Speech Technologies, the SALT spec defines a set of lightweight tags as extensions to commonly used Web-based programming languages. ''The idea,'' Mastan told eADT, ''was not to reinvent the wheel, but to take advantage of the existing Web infrastructure and standards, and simply add some lightweight standards that allow developers to add speech to their Web applications in an integrated fashion.''

Basically, the SALT tags allow developers to add speech interfaces to Web content and applications using familiar tools and techniques. In ''multimodal'' applications, the tags can be added to support speech input and output, either as stand-alone events or jointly with other interface options, such as speaking while pointing to the screen with a stylus, Mastan said. In telephony applications, the tags provide a programming interface to manage the speech recognition and text-to-speech resources needed to conduct interactive dialogs with the caller through a speech-only interface.

Version 1.0 of the SALT specification covers three broad capabilities: speech output, speech input and call control. The specification's ''prompt'' tag allows SALT-based applications to play audio and synthetic speech directly, while 'listen' and 'bind' tags provide speech recognition capabilities by collecting and processing spoken user input. In addition, the specification's call control object can be used to provide SALT-based applications with the ability to place, answer, transfer and disconnect calls, along with advanced capabilities such as conferencing. The SALT specification draws on emerging W3C standards such as Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS) and semantic interpretation for speech recognition to provide additional application control.

The SALT specification is designed to work equally well on traditional computers, handheld devices, home electronics, telematics devices (such as in-car navigation systems) and mobile phones.

''What's really going to matter here from an app development perspective is the types of tools available to application developers for building these multimodal applications,'' Peter Gavalakis, marketing manager at Intel, told eADT. ''Not the SALT tags in and of themselves. But you need some standard or at least an open specification that an industry ecosystem can develop around.''

Intel, another founding member, joined with Microsoft, Cisco Systems, Philips, Comverse and SpeechWorks to form SALT last October. More than 50 organizations, including Verizon Wireless, have joined since. The addition of a strong carrier like Verizon provides a ''significant boost to making speech mainstream in cell phone applications,'' Mastan said.

SALT-based offerings are already coming down the product pipeline. In May, Microsoft unveiled the beta release of .NET Speech SDK, a Web development tool the Redmond software maker billed as the first SALT-compliant product. Philips is reportedly building a SALT-based browser and telephony platform. HeyAnita, a speech hosting company, is developing a SALT-based browser for its hosted speech platform. Carnegie Mellon University is developing an open-source SALT browser, which the university expects will ship by year end. And Kirusa is building SALT-based multimodal wireless apps.

Rob Kassel, product manager for emerging technologies at SpeechWorks, said his company was focusing on the multimodal space before SALT, working with companies that were building their applications using ad hoc techniques. The company's flagship OpenSpeech offering is described as a speech-recognition solution optimized for VoiceXML, another emerging speech standard.

VoiceXML, a markup language used to describe an interaction between a caller on a telephone and a server, uses XML tags to describe the call flow. It was written by the VoiceXML Forum, and then sent off to the World Wide Web Consortium (W3C). It was created, Kassel explained, to standardize the development of ''speech-in-speech-out'' telephony applications, and was specifically designed for Interactive Voice Response (IVR) applications, without a consideration of multimodal technologies or the Web.

''SALT takes a different approach,'' Kassel said, ''in that it presumes a Web-based environment that will encompass both the speech-in-speech-out and the multimodal aspects of speech-enabled applications.''

Microsoft's Mastan believes that both standards will probably be around for a while, and he said that there is some discussion among standards bodies about convergence of the two in the future.

The SALT specification is currently being submitted to an as yet unrevealed standards body. The SALT 1.0 specification is royalty free, available for download from the Forum's Web site at www.saltforum.org.

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].