OpenAI unveils three new audio models for real-time voice tasks

OpenAI has introduced three new audio models — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — for its developer platform, marking a shift beyond transcription and chat toward voice agents capable of listening, translating, and acting during live conversations. The models are designed respectively to handle complex requests and maintain context across long voice sessions, support translation from over 70 languages into 13 output languages, and generate live speech-to-text captions and meeting notes in real time. Early customers testing the models include real estate platform Zillow, travel agency Priceline, and European telecoms firm Deutsche Telekom, with pricing starting at $32 per million audio input tokens for GPT-Realtime-2.

Mosaic News

Mosaic News

OpenAI unveils three new audio models for real-time voice tasks