News

OpenAI Announces Availability of Realtime API in Public Beta

OpenAI has introduced the public beta of its Realtime API, offering developers a tool to integrate natural, low-latency, multimodal interactions into their applications. Now available to all paid developers, this API facilitates real-time speech-to-speech conversations with minimal delay, providing a smoother and more engaging user experience, according to the company.

The Realtime API allows for natural speech-based interactions, similar to OpenAI’s ChatGPT Advanced Voice Mode, featuring six preset voices. This new feature is expected to revolutionize applications such as language-learning platforms and customer service chatbots, where fast and fluid communication is crucial.

Alongside the Realtime API, OpenAI is enhancing its Chat Completions API to support both audio input and output, catering to use cases that don’t require the low-latency performance of real-time streaming. This update enables developers to input both text and audio into the GPT-4 model, receiving responses in text, audio, or both.

"Developers no longer have to stitch together multiple models to create conversational AI experiences," OpenAI said in its announcement. "Now, they can build with just a single API call."

Before the release of the Realtime API, developers looking to build sophisticated voice assistants had to rely on a series of separate models. Audio had to be transcribed by an automatic speech recognition system like Whisper, then processed by a text model, and finally rendered into speech using a text-to-speech engine. This complex approach often resulted in delayed interactions and loss of nuance, such as emotional tone or accent.

The Realtime API was developed to simplify this process by streaming audio inputs and outputs directly. Developers can now create conversational agents that not only speak but also handle interruptions naturally. This is functionality reminiscent of ChatGPT’s Advanced Voice Mode. Additionally, the API supports function calling, which allows voice assistants to respond to user requests by performing actions, such as placing orders or retrieving personalized information.

OpenAI has already tested the Realtime API with a select group of partners. One of the early adopters, Healthify, a fitness and nutrition app, uses the API to enable natural, conversational interactions between users and its AI coach, Ria. Another partner, Speak, provider of a language learning app, uses the API to facilitate role-playing conversations, encouraging users to practice new languages in real time.

The Realtime API is now available in public beta and is powered by OpenAI’s new GPT-4o model. Developers can expect pricing for the API to start at $5 per 1 million text input tokens and $100 per 1 million audio input tokens, the company said. Audio output tokens are priced at $200 per million, which equates to approximately $0.24 per minute of audio output, the company said.

The expanded audio capabilities in the Chat Completions API are scheduled for release in the coming weeks. Both the Realtime API and the Chat Completions API will use the GPT-4o model, with pricing aligned across both services.

The Realtime API includes robust safety protections, including automated monitoring and human review of flagged interactions. OpenAI emphasized that these features are built on the same safety infrastructure used in ChatGPT’s voice capabilities and that developers are prohibited from using the API for malicious purposes.

OpenAI says it plans to introduce additional functionalities, including support for more input modalities like vision and video, increased rate limits, and integration with official SDKs for Python and Node.js. Developers can also expect expanded model support, with the inclusion of GPT-4o mini in future releases, the company said.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].