News

OpenAI Expands AI Fine-Tuning Capabilities, Adding Multimodal Features

OpenAI, fresh from securing a funding boost that catapulted its valuation to $157 billion, has introduced new tools for developers, enhancing its AI capabilities with multimodal fine-tuning options for its GPT models. The announcement was made at the company's exclusive DevDay event, which highlighted innovations aimed at simplifying the AI development process.

One of the standout features unveiled is a new suite for API-based "model distillation," a process that enables developers to fine-tune smaller, more resource-efficient language models (SLMs) using data from their larger counterparts. OpenAI's smaller models, such as GPT-4o-mini and o1-mini, are ideal for edge devices or highly specialized tasks, where the larger GPT-4o and o1-preview models may be overpowered and costly to deploy.

Traditionally, model distillation has been a complex and error-prone task. Developers had to manually generate datasets, fine-tune models, and evaluate performance across multiple disconnected tools. OpenAI's new suite simplifies this by consolidating the entire process into one platform, allowing for a more seamless and iterative workflow. "By integrating these steps, we’re significantly reducing the effort and complexity involved in model distillation," OpenAI said.

In another development, OpenAI is expanding the fine-tuning capabilities of its flagship GPT-4o model to include image-based datasets. While the platform has long supported fine-tuning with text data, the ability to ground GPT-4o on images unlocks new vision capabilities.

Developers can now customize GPT-4o’s image recognition abilities, enhancing applications like visual search, autonomous vehicle object detection, and medical image analysis. According to OpenAI, demonstrable improvements in vision capabilities can be achieved with as few as 100 images. This feature is available to developers on all paid usage tiers, and for the month of October, OpenAI is offering discounted image-based training.

OpenAI also introduced audio-based upgrades, rolling out a beta of its Realtime API for paying developers. This new tool is designed to power voice assistant applications with minimal latency, allowing for real-time, conversational interactions.

The company also announced upcoming audio support in its chat completions API, enabling the processing of both audio and text inputs. Developers will soon be able to input audio and receive a response in text, audio, or both, streamlining a process that previously required multiple steps. However, OpenAI notes that this method is still slower than human conversation.

Rounding out its DevDay announcements, OpenAI introduced a new "prompt caching" feature aimed at cutting costs for developers. This feature reuses input tokens that have been recently processed, providing a 50% discount and faster response times for frequently used API calls.

The caching feature is automatically enabled for GPT-4o, GPT-4o-mini, and other versions of the models, including fine-tuned iterations.

OpenAI's latest suite of enhancements marks another step in its strategy to make AI more accessible and flexible for developers across industries, while cutting costs and expanding the multimodal abilities of its flagship models.

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.