-
サマリー
あらすじ・解説
OpenAI has recently launched a number of new features to its API. The Realtime API enables developers to build speech-to-speech experiences within their applications. The Vision Fine-tuning API enables developers to fine-tune GPT-4o with images and text to improve its visual understanding capabilities. Model Distillation lets developers create cost-effective models by using the outputs of more powerful models like GPT-4o to train smaller models. Prompt Caching helps developers reduce costs and latency by automatically caching input tokens, thereby reducing the amount of computation needed for frequently repeated inputs.
OpenAI's new Realtime API:
- Low-latency, multimodal experiences: The Realtime API enables developers to build applications with fast speech-to-speech conversations, similar to ChatGPT’s Advanced Voice Mode.
- Natural conversational experiences with a single API call: Developers no longer need to use multiple models for speech recognition, text processing, and text-to-speech. The Realtime API handles the entire process with one call.
- Streaming audio inputs and outputs: This allows for more natural conversations compared to previous approaches that resulted in noticeable latency and loss of emotion and emphasis.
- Automatic interruption handling: The Realtime API, much like Advanced Voice Mode in ChatGPT, can manage interruptions smoothly.
- Persistent WebSocket connection to exchange messages with GPT-4o: This underlies the Realtime API's functionality.
- Function calling: Voice assistants built with the Realtime API can respond to user requests by triggering actions or accessing new information.
- Six preset voices: The Realtime API utilizes the same six preset voices already available in the API.
The sources also discuss new features and capabilities in the Chat Completions API:
- Audio input and output in the Chat Completions API: This will allow developers to build applications that use audio without needing the low-latency of the Realtime API.
- Input and receive text or audio: Developers can choose to have GPT-4o respond with text, audio, or both.
Join our community: getcoai.com
Follow us on Twitter or watch us on Youtube
Get our newsletter!