Multiple modes -- ADTmag

Multiple modes

By John K. Waters
December 1, 2002

One of the most exciting areas of innovation in speech tech today centers on the concept of ''multimodality.'' Multimodal applications essentially provide users with a choice of input sources, generally including voice, keypad, keyboard, mouse and stylus. Output takes the form of spoken prompts, audio and/or graphical displays.

Multimodality is a way of enhancing the usability of applications, said Sandeep Sibal, CTO at Kirusa, Berkeley Heights, N.J. Founded in 2001, Kirusa was established specifically to focus on developing multimodal solutions for mobile applications.

''Just like the graphical user interface revolutionized the way people used their PCs,'' Sibal said, ''I think multimodality brings in a fundamental shift in how we understand user interfaces and, for the first time, begins to combine two very dissimilar interfaces: the GUI and the voice user interface, or VUI.''

A VUI (pronounced ''vooey'') is the speech equivalent of a GUI, typically residing on a PDA or smart phone. It is more sophisticated than an interactive voice response (IVR) system, and offers a wider range of commands than simply ''yes'' or ''no.''

Combining these two interfaces is an extremely complex exercise, said Sibal, because they differ fundamentally. GUIs utilize two-dimensional space to express themselves, while VUIs express themselves over time.

The architecture of multimodality typically keeps the speech recognition and synthesis on the server side. The memory demands of speech are usually more than small client devices can handle.

There are two types of multimodality: sequential multimodality and simultaneous multimodality. In sequential multimodality, users can switch between interfaces. ''The notion here is that you are using only one interface at a given instant,'' Sibal said, ''but in a single session you might go back and forth.''

In simultaneous multimodality, both interfaces are active at the same time. Users can click to a map, say ''How do I get to here?'' and then tap the destination on the screen. ''It's very natural to do it this way,'' said Sibal. ''It's just not something that apps do today.

''With today's devices, many of which are not able to keep both the speech and GUI active simultaneously, you can start off with sequential and then move on to simultaneous as the devices that allow you to do that become available,'' he added.

Kirusa's flagship product, Kirusa Multimodal Platform (KMMP) currently supports sequential multimodality, but Sibal said upcoming versions will also support the simultaneous mode.

Although the company's early products were built on its own languages, which were based on VoiceXML, Kirusa was also an early supporter of SALT.

''I think the SALT forum's initiative helped us not just in terms of coming up with some kind of standard for representing multimodal applications, but also in terms of getting the awareness of multimodality raised in the community,'' Sibal said. ''Cool technology alone is not enough. You have to evangelize the stuff. When you have a bunch of industry heavyweights like Microsoft doing it for you... well, it's just what this industry needs.''

See the following related stories:
Giving applications a voice , by John K. Waters
Talking speech tech , by John K. Waters
Speech specs , by John K. Waters

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: Copilot Studio, Microsoft Agent Framework and Foundry: Building Multi-Agent AI Systems
June 8-9, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
July 9-10, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
July 14-17, 2026

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Free White Papers

More Tech Library

Multiple modes

Featured

Build 2026: Microsoft Wants Developers to Build Agentic Applications on Windows

IBM and Red Hat Pledge $5 Billion to Advance Open Source AI Technologies

Java Maintenance Engineering Shifts Focus on Quarterly Critical Patch Stabilization

Oracle Implements Trust Restrictions on Chunghwa Root Certificates in Java Environment

Java Maintenance Engineering Shifts Focus on Quarterly Critical Patch Stabilization

Oracle Implements Trust Restrictions on Chunghwa Root Certificates in Java Environment

Hazelcast Platform 5.7.0 Introduces Java 25 Support and Stream Processing Updates

IBM and Red Hat Pledge $5 Billion to Advance Open Source AI Technologies

Google’s AI Strategy Is Becoming a Platform Architecture Story

Java Maintenance Engineering Shifts Focus on Quarterly Critical Patch Stabilization

Oracle Implements Trust Restrictions on Chunghwa Root Certificates in Java Environment

Hazelcast Platform 5.7.0 Introduces Java 25 Support and Stream Processing Updates

IBM and Red Hat Pledge $5 Billion to Advance Open Source AI Technologies

Google’s AI Strategy Is Becoming a Platform Architecture Story

Upcoming Training Events

Free White Papers

Sponsored Webcasts