IBM out to lure Web developers via voice tech
- By John K. Waters
- December 16, 2002
[ADT's PROGRAMMERS REPORT, December 17, 2002] -- ''Yes,'' ''yeah,'' ''okay,'' ''all
right,'' ''no,'' ''nah'' -- this is a small sampling of the diverse input possible
even in highly constrained voice systems. Though alluring, voice as an input
system has been slow to evolve, in part because no single tool provider has
proved a standout in this area.
Yet IBM continues its quest to become the tool provider of choice for voice
application developers with new software products for builders of voice portals.
Big Blue's recently unveiled WebSphere Voice Application Access product is
middleware designed to simplify the process of building and managing voice
portals, and expand Web-based portals to voice.
''What we're doing with WebSphere Voice Application Access is extending the
portal paradigm to voice,'' said Sunil Soares, director of product management at
IBM's Pervasive Computing Division. ''We are helping developers to create the
next generation of voice portals, which are part of a multichannel, multimodal
approach to accessing back-end applications.''
The new offering includes IBM's WebSphere Voice Server, as well as
ready-to-use e-mail, personal information management (PIM) functions and sample
portlets. It also supports VoiceXML and Java, including development tools based
on Eclipse, the open-source, vendor-neutral platform for writing software. It
uses open-standard programming languages to create voice-enabled applications
that will interoperate with a range of Web servers and databases.
A key component of this new release for developers is the WebSphere Voice
Toolkit, which allows users to write VoiceXML applications, as well as test and
debug the grammars and pronunciation. IBM's Soares sees a big opportunity in
this market for developers, which his company is courting energetically.
''There is a large pool of Web developers out there not
yet familiar with voice technologies,'' Soares told Programmers Report
. ''We'd like to unleash that
vast pool of Web developers to start writing voice applications.''
Many enterprises prefer to utilize internal resources to develop their own
speech applications, but speech app development is a highly specialized
discipline. In the words of Soares, it's ''tricky stuff.''
''Believe it or not,'' Soares explained, ''there are many ways that you can say
even something as simple as 'yes' and 'no.' There's 'yes,' 'yeah,' 'okay,' 'all
right,' 'no,' 'nah' -- and even more. And it gets worse. If I'm writing a stock
quote application, for example, I know it will have multiple stocks, maybe
15,000 that are regularly traded in the U.S. And let's say [some] of them will
be IBM. Now the grammar for IBM could be 'IBM,' 'International Business
Machines' or 'Big Blue.' I need the ability to write that grammar. If I'm adding
Microsoft, 'Microsoft' doesn't exist in the dictionary, so the engine has to be
able to recognize a specific pronunciation.''
IBM supports a language called the International Phonetic Alphabet (IPA),
Soares said, which makes it easier for users to build pronunciations.
IBM pioneered voice recognition technology with its ViaVoice dictation
software. The company continues to invest heavily in the technology, which it
considers an essential component of its ''pervasive computing'' strategy, and it
has thrown its considerable weight behind the VoiceXML standard.
''We want to make voice application development as easy as possible,'' Soares
said. ''We recognize that voice application development historically has been the
domain of a very few people with very specific, highly proprietary skills. We
want to change that by supporting VoiceXML.''
IBM is also a fierce proponent of ''multimodality,'' which these tools support.
Multimodal applications essentially provide users with a choice of input
sources, generally including voice, keypad, keyboard, mouse and stylus. Output
takes the form of spoken prompts, audio and/or graphical displays.
IBM has just released a beta version of its WebSphere Multimodal toolkit, and
the recently submitted a specification to the W3C called XHTML+Voice Profile --
X+V for short -- which Soares believes will make multimodality more accessible
to a greater number of developers. X+V comprises XHTML v 1.0, some VoiceXML
modules and a set of XML events.
''The beauty of X+V,'' Soares said, ''is that you write an application once, and
it's rendered in three different modes: visual-only, voice-only and multimodal.
From an application development perspective, this is how we get the Web and
voice developers together.''
WebSphere Voice Application Access is expected to be available December 20,
2002; the SDK will be available as a free download for users of IBM's WebSphere
Studio IDE.
Links:
To read more stories of related interest, please see:
''Giving
applications a voice'' by John K. Waters at http://www.adtmag.com/article.asp?id=6957
''IBM to buy Rational for more than $2B'' by Jack Vaughan at http://www.adtmag.com/article.asp?id=7038
''WebSphere Studio update sports deeper Eclipse support'' by Jack Vaughan at http://www.adtmag.com/article.asp?id=6817
For other Programmers Report articles, please go to http://www.adtmag.com/article.asp?id=6265
About the Author
John K. Waters is a freelance writer based in Silicon Valley. He can be reached
at [email protected].