logo-darklogo-darklogo-darklogo-dark
  • Tool Categories
    • 🎨Art & Creative Design505
    • 🏢Business Management644
    • 💻Coding & Development515
    • 👮Detection83
    • 🧠General Use727
    • 🏥Health & Wellness55
    • 📷Image & Photo Analysis100
    • 🖼️Image Generation & Editing618
    • 📐Interior & Architectural Design37
    • 🎓Learning & Education483
    • ⚖️Legal & Finance90
    • 🎭Lifestyle & Entertainment236
    • 📢Marketing & Advertising627
    • 🎧Music & Audio138
    • 👔Office & Workplace1,014
    • 🔬Research & Data Analysis372
    • 👥Social Media245
    • 🎥Video Generation & Editing426
    • 👧🏻Virtual Companion135
    • 🎤Voice Generation & Editing381
    • ✍️Writing & Editing808
    • All Categories
    • AI Use Cases
  • News
  • Events
    • Academic Conferences
    • Developer Conferences
    • Expos / Trade Shows
    • Industry Summits
    • Workshops / Training
    • All Events
    • Past Events
  • Saved Tools
  • Suggest a Tool
✕
Home › News › OpenAI launches three new voice models for real-time conversations

OpenAI launches three new voice models for real-time conversations

May 7, 2026
App icon: white rounded square with a blue audio waveform in the center on a blue abstract background.

#image_title

OpenAI has launched three new voice models through its API that promise to make voice interactions with AI feel more natural and intelligent. The new models can reason through complex requests, translate conversations in real-time, and transcribe speech as people speak.

The release marks a significant step forward in voice AI technology, moving beyond simple back-and-forth exchanges toward systems that can actually understand context and take meaningful action during conversations. This development comes as voice interfaces become increasingly important for everything from customer service to travel planning, where users expect AI to handle complex, multi-step tasks through natural speech.

The three new models each serve different aspects of voice interaction. GPT-Realtime-2 is the flagship model that brings GPT-5-level reasoning capabilities to voice conversations, allowing it to handle difficult requests while maintaining natural conversation flow. GPT-Realtime-Translate enables live translation between more than 70 input languages and 13 output languages, keeping pace with speakers in real-time. GPT-Realtime-Whisper provides streaming speech-to-text transcription that works as people talk, rather than after they finish speaking.

These capabilities address a growing need in the software industry. Voice has become one of the most natural ways people interact with technology, especially in situations where typing isn’t practical – like while driving, walking through airports, or needing help in a different language. However, building useful voice products has required more than just fast responses or natural-sounding speech.

Companies are already testing these models in real-world scenarios. Zillow is building an assistant that can listen to complex housing requests like “find me homes within my budget, avoid busy streets, and schedule a tour for Saturday,” then reason through the requirements and take action. Priceline is working toward voice-managed travel experiences where customers can handle entire trips through conversation, from booking to managing changes and getting real-time updates.

The improvements in GPT-Realtime-2 are substantial. The model can now:

  • Use short phrases like “let me check that” to keep users informed while processing requests
  • Call multiple tools simultaneously while narrating its actions
  • Recover gracefully from errors with natural explanations
  • Handle context windows of 128K tokens, up from 32K, for longer conversations
  • Better retain specialized terminology and proper nouns
  • Adjust its tone based on the situation – calm during problem-solving, empathetic when users are frustrated
  • Scale reasoning effort from minimal to extra-high depending on request complexity

The performance gains are measurable. GPT-Realtime-2 scores 15.2% higher than its predecessor on Big Bench Audio tests for audio intelligence and 13.8% higher on Audio MultiChallenge tests for following instructions in conversations.

Live translation capabilities fill a crucial gap in global communication. Deutsche Telekom is testing the translation model for multilingual customer support, where lower latency and better fluency can make cross-language conversations feel natural rather than stilted. The model needs to preserve meaning while keeping pace with speakers, even when people use regional pronunciation or industry-specific language.

The streaming transcription model addresses the lag that makes current voice interfaces feel clunky. Instead of waiting for someone to finish speaking before transcribing, GPT-Realtime-Whisper works continuously, enabling faster captions for meetings and broadcasts, real-time note-taking, and more responsive voice agents.

Safety remains a priority with multiple layers of protection. OpenAI uses active classifiers to monitor conversations in real-time, stopping sessions that violate content guidelines. The company also requires developers to clearly indicate when users are interacting with AI, unless it’s obvious from context.

The models are available now through OpenAI’s Realtime API. GPT-Realtime-2 costs $32 per million audio input tokens and $64 per million output tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute. All three models support EU data residency requirements for European applications.

This release puts OpenAI in direct competition with other tech giants working on voice AI, including Google’s conversation AI and Amazon’s Alexa improvements. The ability to reason through complex requests while maintaining natural conversation flow could be a significant differentiator, especially for business applications where current voice assistants often fall short of user expectations.

Share

Related news

Older man with gray hair in a dark suit and red tie, speaking outdoors among green foliage.

#image_title

June 19, 2026

Norway bans AI tools in elementary schools starting this fall


Read more
Two men stand side by side in front of a pale wall with a large white cloud-logo; the left man wears glasses and a lapel mic.

#image_title

June 18, 2026

OpenAI recruits Google DeepMind’s Noam Shazeer and a Trump White House AI official ahead of its IPO


Read more
Smiling man in a dark blazer and light shirt poses for a portrait in front of a Vox Media backdrop with logos behind him.

#image_title

June 18, 2026

Amazon wants to sell its AI chips to outside companies, taking aim at Nvidia


Read more

Recent Posts

  • Norway bans AI tools in elementary schools starting this fall
  • OpenAI recruits Google DeepMind’s Noam Shazeer and a Trump White House AI official ahead of its IPO
  • Amazon wants to sell its AI chips to outside companies, taking aim at Nvidia
  • Adobe rolls out its creative agent across Photoshop, Premiere, and more
  • Midjourney is building a full-body ultrasonic scanner and spas to house them
Best AI Tools

Discover the best AI tools for any use case

Explore
  • Tool Categories
  • AI Use Cases
  • AI Events
  • AI News
  • Saved Tools
Company
  • About Us
  • Contact Us
  • Media & Partnerships
  • Suggest a Tool
Legal
  • Privacy Policy
  • Terms of Service
Copyright © 2026 Best AI Tools 415 Mission Street, 37th Floor, San Francisco, CA 94105