OpenAI Dev Day Podcast
2024/10/03
再生時間： 14 分
ポッドキャスト

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

OpenAI Dev Day Podcast

無料で聴く

ポッドキャストの詳細を見る

サマリー
OpenAI has recently launched a number of new features to its API. The Realtime API enables developers to build speech-to-speech experiences within their applications. The Vision Fine-tuning API enables developers to fine-tune GPT-4o with images and text to improve its visual understanding capabilities. Model Distillation lets developers create cost-effective models by using the outputs of more powerful models like GPT-4o to train smaller models. Prompt Caching helps developers reduce costs and latency by automatically caching input tokens, thereby reducing the amount of computation needed for frequently repeated inputs.

OpenAI's new Realtime API:
Low-latency, multimodal experiences: The Realtime API enables developers to build applications with fast speech-to-speech conversations, similar to ChatGPT’s Advanced Voice Mode.
Natural conversational experiences with a single API call: Developers no longer need to use multiple models for speech recognition, text processing, and text-to-speech. The Realtime API handles the entire process with one call.
Streaming audio inputs and outputs: This allows for more natural conversations compared to previous approaches that resulted in noticeable latency and loss of emotion and emphasis.
Automatic interruption handling: The Realtime API, much like Advanced Voice Mode in ChatGPT, can manage interruptions smoothly.
Persistent WebSocket connection to exchange messages with GPT-4o: This underlies the Realtime API's functionality.
Function calling: Voice assistants built with the Realtime API can respond to user requests by triggering actions or accessing new information.
Six preset voices: The Realtime API utilizes the same six preset voices already available in the API.
The sources also discuss new features and capabilities in the Chat Completions API:
Audio input and output in the Chat Completions API: This will allow developers to build applications that use audio without needing the low-latency of the Realtime API.
Input and receive text or audio: Developers can choose to have GPT-4o respond with text, audio, or both.
Join our community: getcoai.com
Follow us on Twitter or watch us on Youtube
Get our newsletter!
続きを読む一部表示

あらすじ・解説

OpenAI has recently launched a number of new features to its API. The Realtime API enables developers to build speech-to-speech experiences within their applications. The Vision Fine-tuning API enables developers to fine-tune GPT-4o with images and text to improve its visual understanding capabilities. Model Distillation lets developers create cost-effective models by using the outputs of more powerful models like GPT-4o to train smaller models. Prompt Caching helps developers reduce costs and latency by automatically caching input tokens, thereby reducing the amount of computation needed for frequently repeated inputs.

OpenAI's new Realtime API:

Low-latency, multimodal experiences: The Realtime API enables developers to build applications with fast speech-to-speech conversations, similar to ChatGPT’s Advanced Voice Mode.
Natural conversational experiences with a single API call: Developers no longer need to use multiple models for speech recognition, text processing, and text-to-speech. The Realtime API handles the entire process with one call.
Streaming audio inputs and outputs: This allows for more natural conversations compared to previous approaches that resulted in noticeable latency and loss of emotion and emphasis.
Automatic interruption handling: The Realtime API, much like Advanced Voice Mode in ChatGPT, can manage interruptions smoothly.
Persistent WebSocket connection to exchange messages with GPT-4o: This underlies the Realtime API's functionality.
Function calling: Voice assistants built with the Realtime API can respond to user requests by triggering actions or accessing new information.
Six preset voices: The Realtime API utilizes the same six preset voices already available in the API.

The sources also discuss new features and capabilities in the Chat Completions API:

Audio input and output in the Chat Completions API: This will allow developers to build applications that use audio without needing the low-latency of the Realtime API.
Input and receive text or audio: Developers can choose to have GPT-4o respond with text, audio, or both.

Join our community: getcoai.com
Follow us on Twitter or watch us on Youtube
Get our newsletter!

続きを読む一部表示