OpenAI releases ChatGPT’s hyper-realistic voice to some paying users

OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday, giving users their first access to GPT-4o’s hyper-realistic audio responses. The alpha version will be available to a small group of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in the fall of 2024.

When OpenAI first showcased GPT-4o’s voice in May, the feature shocked audiences with quick responses and an uncanny resemblance to a real human’s voice – one in particular. The voice, Sky, resembled that of Scarlett Johansson, the actress behind the artificial assistant in the movie “Her.” Soon after OpenAI’s demo, Johansson said she refused multiple inquiries from CEO Sam Altman to use her voice, and after seeing GPT-4o’s demo, hired legal counsel to defend her likeness. OpenAI denied using Johansson’s voice, but later removed the voice shown in its demo. In June, OpenAI said it would delay the release of Advanced Voice Mode to improve its safety measures.

One month later, and the wait is over (sort of). OpenAI says the video and screensharing capabilities showcased during its Spring Update will not be part of this alpha, launching at a “later date.” For now, the GPT-4o demo that blew everyone away is still just a demo, but some premium users will now have access to ChatGPT’s voice feature shown there.

ChatGPT can now talk and listen

You may have already tried out the Voice Mode currently available in ChatGPT, but OpenAI says Advanced Voice Mode is different. ChatGPT’s old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT’s text into voice. But GPT-4o is multimodal, capable of processing these tasks without the help of auxiliary models, creating significantly lower latency conversations. OpenAI also claims GPT-4o can sense emotional intonations in your voice, including sadness, excitement or singing.

In this pilot, ChatGPT Plus users will get to see first hand how hyper-realistic OpenAI’s Advanced Voice Mode really is. TechCrunch was unable to test the feature before publishing this article, but we will review it when we get access.

OpenAI says it’s releasing ChatGPT’s new voice gradually to closely monitor its usage. People in the alpha group will get an alert in the ChatGPT app, followed by an email with instructions on how to use it.

In the months since OpenAI’s demo, the company says it tested GPT-4o’s voice capabilities with more than 100 external red teamers who speak 45 different languages. OpenAI says a report on these safety efforts is coming in early August.

The company says Advanced Voice Mode will be limited to ChatGPT’s four preset voices – Juniper, Breeze, Cove and Ember – made in collaboration with paid voice actors. The Sky voice shown in OpenAI’s May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum says “ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices.”

OpenAI is trying to avoid deepfake controversies. In January, AI startup ElevenLabs’s voice cloning technology was used to impersonate President Biden, deceiving primary voters in New Hampshire.

OpenAI also says it introduced new filters to block certain requests to generate music or other copyrighted audio. In the last year, AI companies have landed themselves in legal trouble for copyright infringement, and audio models like GPT-4o unleash a whole new category of companies that can file a complaint. Particularly, record labels, who have a history for being litigious, and have already sued AI song-generators Suno and Udio.

Source link

OpenAI releases ChatGPT’s hyper-realistic voice to some paying users

ChatGPT can now talk and listen

Recent posts

CoinSwitch sues WazirX to recover trapped funds

M&A can open up the playing field for the competition

Apple’s App Store breaches EU’s Digital Markets Act

Betaworks focuses on AI applications in its latest Camp

From InstaDeep to Paystack: Here are Africa’s biggest startup exits and how much they raised

Viggle makes controllable AI characters for memes and visualizing ideas

Meta’s smart glasses outsell traditional Ray-Bans in some stores, even before AI features roll out

You could learn a lot from a CIO with a $17B IT budget

Last call: Boost your brand by hosting a Side Event at TechCrunch Disrupt 2024

AI mortgage startup LoanSnap loses license to operate in Connecticut

VC darling Rad Power Bikes hit with another round of layoffs

JobGet, a ‘LinkedIn’ for hourly workers, acquires rival Snagajob

Lending fintech SoLo Funds faces class-action lawsuit

As Alexa turns 10, Amazon looks to generative AI

Fintech Payoneer is buying 5-year-old global payroll startup Skuad for $61M in cash

Related articles

Zepto raises another $350 million amid retail upheaval in India

Battery unicorn Northvolt files for bankruptcy, upending Europe’s industrial plan

Brave Search adds AI chat for follow-up questions after your initial query

Cruise fesses up, Pony AI raises its IPO ambitions, and the TuSimple drama dials back up

WhatsApp rolls out voice message transcripts

Threads adjusts its algorithm to show you more content from accounts you follow

Spotify tests a video feature for audiobooks as it ramps up video expansion

Candela brings its P-12 electric ferry to Tahoe and adds another $14M to build more

Company

Follow us