Talking speech tech -- ADTmag

Talking speech tech

By John K. Waters
December 1, 2002

Specialized technology areas tend to have their own jargon, and speech tech is quickly generating an alphabet soup of acronyms. Here are some definitions of some of the key expressions of speech.

Automatic Speech Recognition (ASR) systems -- utilize voice recognition to replace keypad entry for telephone voice menus. These are the systems that tell callers to speak the digits 0 through 9.

Computer Telephone Integration (CTI) -- combines data with voice systems for enhanced telephone services.

Dual-Tone MultiFrequency (DTMF) -- the type of audio signals produced by a touch-tone telephone.

Grammars -- in speech tech circles, ''grammars'' are the phrases a user might say that a speech engine can recognize.

Interactive Voice Response (IVR) -- an automated telephone information system to which callers respond by using the keypad or by speaking words. The system communicates with callers using a combination of fixed voice menus and real-time data from databases.

Prompts -- phrases that a voice system plays back to callers, indicating which information the system needs next. For example: ''Please enter your credit card number.''

Speech Application Language Tags (SALT) -- extensions to HTML, XHTML and XML for voice recognition and synthesized speech output. SALT is the newest specification to emerge from the speech market. It is designed to support ''multimodality,'' including audio, video, text and graphics, depending on the hardware.

Speaker recognition (sometimes called voice authentication) -- refers to systems with the ability to distinguish and confirm the identity of the individual speaking to it. Speaker recognition can be further subdivided into speaker identification, which determines which registered speaker provides a given utterance from among a set of known speakers; and speaker verification, which accepts or rejects the identity claim of a speaker.

Speech engine -- software that either processes speech input or produces speech output.

Speech recognition -- refers to applications and systems that ''understand'' language, regardless of the speaker. It takes the form of a range of applications, from shrink-wrapped dictation programs that live on a desktop to sophisticated business apps that allow customers to interact with a computer over the telephone.

Text-to-Speech (TTS) -- TTS systems convert text into synthesized speech output. These systems were first designed to allow blind users to listen to written material. Today, TTS is used extensively to convey financial data, e-mail messages and other information via telephone.

Voice User Interface (VUI) -- the speech tech equivalent of a GUI, typically residing on a PDA or smart phone. A VUI is more sophisticated than an IVR system, and offers a wider range of commands than simply ''yes'' or ''no.''

Voice browser -- allows users to access the Web using speech synthesis, pre-recorded audio and speech recognition.

Voice portal -- offers a variety of Web-based services on a speech-enabled platform accessible from a telephone. A consumer voice portal is an interface for consumer information, such as newsletters, sports and stocks, typically offered by service providers. An enterprise voice portal provides an integrated telephony interface to a wide range of enterprise applications and information.

Voice XML (VXML) -- A markup language designed to create audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony and mixed-initiative conversations.

See the following related stories:
Giving applications a voice , by John K. Waters
Multiple modes , by John K. Waters
Speech specs , by John K. Waters

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library