IBM out to lure Web developers via voice tech

[ADT's PROGRAMMERS REPORT, December 17, 2002] -- ''Yes,'' ''yeah,'' ''okay,'' ''all right,'' ''no,'' ''nah'' -- this is a small sampling of the diverse input possible even in highly constrained voice systems. Though alluring, voice as an input system has been slow to evolve, in part because no single tool provider has proved a standout in this area.

Yet IBM continues its quest to become the tool provider of choice for voice application developers with new software products for builders of voice portals. Big Blue's recently unveiled WebSphere Voice Application Access product is middleware designed to simplify the process of building and managing voice portals, and expand Web-based portals to voice.

''What we're doing with WebSphere Voice Application Access is extending the portal paradigm to voice,'' said Sunil Soares, director of product management at IBM's Pervasive Computing Division. ''We are helping developers to create the next generation of voice portals, which are part of a multichannel, multimodal approach to accessing back-end applications.''

The new offering includes IBM's WebSphere Voice Server, as well as ready-to-use e-mail, personal information management (PIM) functions and sample portlets. It also supports VoiceXML and Java, including development tools based on Eclipse, the open-source, vendor-neutral platform for writing software. It uses open-standard programming languages to create voice-enabled applications that will interoperate with a range of Web servers and databases.

A key component of this new release for developers is the WebSphere Voice Toolkit, which allows users to write VoiceXML applications, as well as test and debug the grammars and pronunciation. IBM's Soares sees a big opportunity in this market for developers, which his company is courting energetically.

''There is a large pool of Web developers out there not yet familiar with voice technologies,'' Soares told Programmers Report . ''We'd like to unleash that vast pool of Web developers to start writing voice applications.''

Many enterprises prefer to utilize internal resources to develop their own speech applications, but speech app development is a highly specialized discipline. In the words of Soares, it's ''tricky stuff.''

''Believe it or not,'' Soares explained, ''there are many ways that you can say even something as simple as 'yes' and 'no.' There's 'yes,' 'yeah,' 'okay,' 'all right,' 'no,' 'nah' -- and even more. And it gets worse. If I'm writing a stock quote application, for example, I know it will have multiple stocks, maybe 15,000 that are regularly traded in the U.S. And let's say [some] of them will be IBM. Now the grammar for IBM could be 'IBM,' 'International Business Machines' or 'Big Blue.' I need the ability to write that grammar. If I'm adding Microsoft, 'Microsoft' doesn't exist in the dictionary, so the engine has to be able to recognize a specific pronunciation.''

IBM supports a language called the International Phonetic Alphabet (IPA), Soares said, which makes it easier for users to build pronunciations.

IBM pioneered voice recognition technology with its ViaVoice dictation software. The company continues to invest heavily in the technology, which it considers an essential component of its ''pervasive computing'' strategy, and it has thrown its considerable weight behind the VoiceXML standard.

''We want to make voice application development as easy as possible,'' Soares said. ''We recognize that voice application development historically has been the domain of a very few people with very specific, highly proprietary skills. We want to change that by supporting VoiceXML.''

IBM is also a fierce proponent of ''multimodality,'' which these tools support. Multimodal applications essentially provide users with a choice of input sources, generally including voice, keypad, keyboard, mouse and stylus. Output takes the form of spoken prompts, audio and/or graphical displays.

IBM has just released a beta version of its WebSphere Multimodal toolkit, and the recently submitted a specification to the W3C called XHTML+Voice Profile -- X+V for short -- which Soares believes will make multimodality more accessible to a greater number of developers. X+V comprises XHTML v 1.0, some VoiceXML modules and a set of XML events.

''The beauty of X+V,'' Soares said, ''is that you write an application once, and it's rendered in three different modes: visual-only, voice-only and multimodal. From an application development perspective, this is how we get the Web and voice developers together.''

WebSphere Voice Application Access is expected to be available December 20, 2002; the SDK will be available as a free download for users of IBM's WebSphere Studio IDE.

Links:
To read more stories of related interest, please see:
''Giving applications a voice'' by John K. Waters at http://www.adtmag.com/article.asp?id=6957

''IBM to buy Rational for more than $2B'' by Jack Vaughan at http://www.adtmag.com/article.asp?id=7038

''WebSphere Studio update sports deeper Eclipse support'' by Jack Vaughan at http://www.adtmag.com/article.asp?id=6817

For other Programmers Report articles, please go to http://www.adtmag.com/article.asp?id=6265

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].