OpenAI releases ChatGPT’s hyper-realistic voice to some paying users

OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday, giving users their first access to GPT-4o’s hyper-realistic audio responses. The alpha version will be available to a small group of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in the fall of 2024.

When OpenAI first showcased GPT-4o’s voice in May, the feature shocked audiences with quick responses and an uncanny resemblance to a real human’s voice – one in particular. The voice, Sky, resembled that of Scarlett Johansson, the actress behind the artificial assistant in the movie “Her.” Soon after OpenAI’s demo, Johansson said she refused multiple inquiries from CEO Sam Altman to use her voice, and after seeing GPT-4o’s demo, hired legal counsel to defend her likeness. OpenAI denied using Johansson’s voice, but later removed the voice shown in its demo. In June, OpenAI said it would delay the release of Advanced Voice Mode to improve its safety measures.

One month later, and the wait is over (sort of). OpenAI says the video and screensharing capabilities showcased during its Spring Update will not be part of this alpha, launching at a “later date.” For now, the GPT-4o demo that blew everyone away is still just a demo, but some premium users will now have access to ChatGPT’s voice feature shown there.

ChatGPT can now talk and listen

You may have already tried out the Voice Mode currently available in ChatGPT, but OpenAI says Advanced Voice Mode is different. ChatGPT’s old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT’s text into voice. But GPT-4o is multimodal, capable of processing these tasks without the help of auxiliary models, creating significantly lower latency conversations. OpenAI also claims GPT-4o can sense emotional intonations in your voice, including sadness, excitement or singing.

In this pilot, ChatGPT Plus users will get to see first hand how hyper-realistic OpenAI’s Advanced Voice Mode really is. TechCrunch was unable to test the feature before publishing this article, but we will review it when we get access.

OpenAI says it’s releasing ChatGPT’s new voice gradually to closely monitor its usage. People in the alpha group will get an alert in the ChatGPT app, followed by an email with instructions on how to use it.

In the months since OpenAI’s demo, the company says it tested GPT-4o’s voice capabilities with more than 100 external red teamers who speak 45 different languages. OpenAI says a report on these safety efforts is coming in early August.

The company says Advanced Voice Mode will be limited to ChatGPT’s four preset voices – Juniper, Breeze, Cove and Ember – made in collaboration with paid voice actors. The Sky voice shown in OpenAI’s May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum says “ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices.”

OpenAI is trying to avoid deepfake controversies. In January, AI startup ElevenLabs’s voice cloning technology was used to impersonate President Biden, deceiving primary voters in New Hampshire.

OpenAI also says it introduced new filters to block certain requests to generate music or other copyrighted audio. In the last year, AI companies have landed themselves in legal trouble for copyright infringement, and audio models like GPT-4o unleash a whole new category of companies that can file a complaint. Particularly, record labels, who have a history for being litigious, and have already sued AI song-generators Suno and Udio.

Source link

OpenAI releases ChatGPT’s hyper-realistic voice to some paying users

ChatGPT can now talk and listen

Recent posts

Signal gets new video call features, making it a viable alternative to Zoom, Meet and Teams

Nvidia completes acquisition of AI infrastructure startup Run:ai

Telegram quietly enables users to report private chats to moderators after founder’s arrest

French biotech Generare speeds up hunt for new drugs by cloning natural molecules

Russia is banning Discord, an app its military uses

Reddit appears to be back after a 4-hour-long outage

From $19M to $1.5M, here’s how much Anduril pays top execs like Palmer Luckey in cash and stock

British university spinoff Mindgard protects companies from AI threats

Pentagon says mystery drones over New Jersey are ‘not US military,’ not likely foreign

Elon Musk files for injunction to halt OpenAI’s transition to a for-profit

Constellation Technologies & Operations wants to work with telecom operators to deliver 5G internet from space

Elon Musk tweets so much, people bet over $1M weekly to guess how many posts

Led by a founder who sold a video startup to Apple, Panjaya uses deepfake techniques to bite into video dubbing

Illumen Capital doubles down on supporting underrepresented funds

Socium doubles down on Francophone Africa after $5M seed round

Related articles

Meta COO Sheryl Sandberg sanctioned by judge for allegedly deleting emails

Microsoft is no longer OpenAI’s exclusive cloud provider

Scale AI’s Alexandr Wang has published an open letter lobbying Trump to invest in AI

Perplexity launches Sonar, an API for AI search

Trump targets EV charging funding programs Tesla benefits from

Spotify introduces educational audio courses, starting in the UK

Funding to fintechs continues to decline, but at a slower pace

Forum software NodeBB joins the fediverse

Company

Follow us