Multiple modes -- ADTmag

Multiple modes

By John K. Waters
December 1, 2002

One of the most exciting areas of innovation in speech tech today centers on the concept of ''multimodality.'' Multimodal applications essentially provide users with a choice of input sources, generally including voice, keypad, keyboard, mouse and stylus. Output takes the form of spoken prompts, audio and/or graphical displays.

Multimodality is a way of enhancing the usability of applications, said Sandeep Sibal, CTO at Kirusa, Berkeley Heights, N.J. Founded in 2001, Kirusa was established specifically to focus on developing multimodal solutions for mobile applications.

''Just like the graphical user interface revolutionized the way people used their PCs,'' Sibal said, ''I think multimodality brings in a fundamental shift in how we understand user interfaces and, for the first time, begins to combine two very dissimilar interfaces: the GUI and the voice user interface, or VUI.''

A VUI (pronounced ''vooey'') is the speech equivalent of a GUI, typically residing on a PDA or smart phone. It is more sophisticated than an interactive voice response (IVR) system, and offers a wider range of commands than simply ''yes'' or ''no.''

Combining these two interfaces is an extremely complex exercise, said Sibal, because they differ fundamentally. GUIs utilize two-dimensional space to express themselves, while VUIs express themselves over time.

The architecture of multimodality typically keeps the speech recognition and synthesis on the server side. The memory demands of speech are usually more than small client devices can handle.

There are two types of multimodality: sequential multimodality and simultaneous multimodality. In sequential multimodality, users can switch between interfaces. ''The notion here is that you are using only one interface at a given instant,'' Sibal said, ''but in a single session you might go back and forth.''

In simultaneous multimodality, both interfaces are active at the same time. Users can click to a map, say ''How do I get to here?'' and then tap the destination on the screen. ''It's very natural to do it this way,'' said Sibal. ''It's just not something that apps do today.

''With today's devices, many of which are not able to keep both the speech and GUI active simultaneously, you can start off with sequential and then move on to simultaneous as the devices that allow you to do that become available,'' he added.

Kirusa's flagship product, Kirusa Multimodal Platform (KMMP) currently supports sequential multimodality, but Sibal said upcoming versions will also support the simultaneous mode.

Although the company's early products were built on its own languages, which were based on VoiceXML, Kirusa was also an early supporter of SALT.

''I think the SALT forum's initiative helped us not just in terms of coming up with some kind of standard for representing multimodal applications, but also in terms of getting the awareness of multimodality raised in the community,'' Sibal said. ''Cool technology alone is not enough. You have to evangelize the stuff. When you have a bunch of industry heavyweights like Microsoft doing it for you... well, it's just what this industry needs.''

See the following related stories:
Giving applications a voice , by John K. Waters
Talking speech tech , by John K. Waters
Speech specs , by John K. Waters

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 2-Day Hands-On Training Seminar: Building and Deploying Microservice Applications Using .NET9 and Dapr on Azure Container Apps
March 4 - 5, 2025

Visual Studio Live! Las Vegas
March 10-14, 2025

Live! 360 2-Day Hands-On Seminar: From Traction to Production: Building Generative AI Applications with Azure AI Studio
March 25-26, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Free White Papers

More Tech Library

Multiple modes

Featured

DigitalOcean Launches AI Platform to Simplify Deployment of Generative AI Agents

Red Hat Launches OpenShift Virtualization Engine to Streamline Virtual Machine Management

Azul Helps EU Banks Navigate Stringent DORA Regulations with Tailored Solutions

Red Hat Launches Kubernetes-Native Connectivity Solution to Simplify Application Management Across Hybrid Clouds

Python Poised to Claim 2024 'Language of the Year' as Fortran Climbs in Steady TIOBE Index Rankings

Java Trends Signal a Shift in Developer Learning Amid AI Surge

GitHub Launches Free Version of AI-Powered Copilot, Expanding Access for 150 million Developers

Azul Helps EU Banks Navigate Stringent DORA Regulations with Tailored Solutions

Codethink Joins Eclipse Foundation to Drive Global Standards for Trustable Software

Python Poised to Claim 2024 'Language of the Year' as Fortran Climbs in Steady TIOBE Index Rankings

Java Trends Signal a Shift in Developer Learning Amid AI Surge

GitHub Launches Free Version of AI-Powered Copilot, Expanding Access for 150 million Developers

Azul Helps EU Banks Navigate Stringent DORA Regulations with Tailored Solutions

Codethink Joins Eclipse Foundation to Drive Global Standards for Trustable Software

Upcoming Training Events

Free White Papers

Sponsored Webcasts