The creator of ChatGPT’s voice wants to build the tech from “Her,” minus the dystopia

Date:

Share post:


Alexis Conneau thinks a lot about the movie “Her.” For the last several years, he’s obsessed over trying to turn the film’s fictional voice technology, Samantha, into a reality.

Conneau even uses a picture of Joaquin Pheonix’s character in the movie as his banner on Twitter.

Conneau’s X/twitter banner (Image Credit: X)

With ChatGPT’s Advanced Voice Mode, a project Conneau started at OpenAI after doing similar work at Meta, he kind of did it. The AI system natively processes speech, and talks back much like a human.

Now, he has a new startup, WaveForms AI, that’s trying to build something better.

Conneau spends a good chunk of time thinking about how to avoid the dystopia shown in that movie, he told TechCrunch in an interview. “Her” was a science fiction film about a world where people develop intimate relationships with AI systems, instead of other humans.

“The movie is a dystopia, right? It’s not a future we want,” said Conneau. “We want to bring that technology – which now exists and will exist – and we want to bring it for good. We want to do precisely the opposite of what the company in that movie does.”

Building the tech, minus the dystopia that comes with it, seems like a contradiction. But Conneau intends to build it anyways, and he’s convinced his new AI startup will help people “feel the AGI” with their ears.

On Monday, Conneau launched WaveForms AI, a new audio LLM company training its own foundation models. It’s aiming to release AI audio products in 2025 that compete with offerings from OpenAI and Google. The startup raised $40 million in seed funding, it announced on Monday, led by Andreessen Horowitz.

Conneau says Marc Andreessen – who previously wrote that AI should be part of every aspect of human life – has taken a personal interest in his endeavor.

It’s worth noting that Conneau’s obsession with the movie “Her” may have landed OpenAI in trouble at one point. Scarlett Johansson sent a legal threat to Sam Altman’s startup earlier this year, ultimately forcing OpenAI to take down one of ChatGPT’s voices that strongly resembled her character in the film. OpenAI denied ever trying to replicate her voice.

But it’s undeniable how much the movie has influenced Conneau. “Her” was clearly science fiction when it was released in 2013 — at the time, Apple’s Siri was quite new and very limited. But today, the technology feels scarily within reach.

AI companionship platforms like Character.AI reach millions of users weekly who just want to talk with its chatbots. The sector is emerging as a popular use case for generative AI — despite occasionally tragic and unsettling outcomes. You can imagine how someone typing with a chatbot all day would love the chance to speak with it too, especially using tech as convincing as ChatGPT’s Advanced Voice Mode.

The CEO of WaveForms AI is wary of the AI companionship space, and it’s not the core of his new company. While he thinks people will use WaveForms’ products in new ways – such as talking to an AI for 20 minutes in the car to learn about something – Conneau says he wants the company to be more “horizontal.”

“[WaveForms AI] can be that teacher that inspires, you know, maybe that teacher that you wouldn’t have in your life, at least, your physical life,” said the CEO.

In the future, he believes talking to generative AI will be a more common way to interact with all kinds of technology. That may include talking to your car, talking to your computer, and WaveForms aims to supply the “emotionally intelligent” AI that facilitates it all.

“I don’t believe in the future where human-to-AI interaction replaces human-to-human interaction,” said Conneau. “If anything, it’s going to be complementary.”

He says AI can learn from the mistakes of social media. For instance, he thinks AI shouldn’t optimize for “time spent on platform,” a common metric of success for social apps that can promote unhealthy habits, like doomscrolling. More broadly, he wants to make sure WaveForms’ AI is aligned with the best interests of humans, calling this “the most important work you could do.”

Conneau says OpenAI’s name for his project, “Advanced Voice Mode,” doesn’t really do justice to how different the technology is from ChatGPT’s regular voice mode.

The old voice mode was really just translating your voice into text, running it through GPT-4, and then converting that text back into speech. It was a somewhat hacked together solution. However, with Advanced Voice Mode, Conneau says that GPT-4o is actually breaking down the audio of your voice into tokens (apparently, every second of audio is equal to roughly three tokens) and running those tokens directly through an audio-specific transformer model. That, he explained, is what enables Advanced Voice Mode to have such low latency.

One claim that gets thrown around a lot when talking about AI audio models is that they can supposedly “understand emotions.” Much like text-based LLMs are based on patterns found in heaps of text documents, audio LLMs do the same thing with audio clips of humans talking. Humans label these clips as “sad” or “excited” so that AI models recognize similar voice patterns when they hear you say it, and even respond back with emotional intonations of their own. So it’s less that they “understand emotions” and more that they systematically recognize audio qualities that humans associate with those emotions.

Making AI more personable, not smarter

Conneau is betting that generative AI today doesn’t need to get significantly smarter than GPT-4o to create better products. Instead of improving the underlying intelligence of these models, like OpenAI is with o1, WaveForms is simply trying to make AI better to talk to.

“There will be a market of people [using generative AI] who will just choose the interaction that is the most enjoyable for them,” said Conneau.

That’s why the startup is confident it can develop its own foundational models — ideally, smaller ones that will be less expensive and faster to run. That’s not a bad bet given recent evidence that the old AI scaling laws are slowing down.

Conneau says his former co-worker at OpenAI, Ilya Sutskever, often talked to him about trying to “feel the AGI” – essentially, using a gut feeling to assess whether we’ve reached superintelligent AI. The CEO of WaveForms is convinced that achieving AGI will be more of a feeling, instead of reaching some sort of benchmark, and audio LLMs will be the key to that feeling.

“I think you’ll be able to feel the AGI a lot more when you can talk to it, when you can hear the AGI, when you can actually talk to the transformer itself,” said Conneau, repeating comments he made to Sutskever over dinner.

But as startups make AI better to talk to, they clearly also have a responsibility to figure out how to make sure people don’t get addicted. Although, Andreessen Horowitz general partner Martin Casado, who helped lead the investment in WaveForms, says it’s not necessarily a bad thing if people are talking to AI more often.

“I can go talk to a random person on the internet, and that person can bully me, that person can take advantage of me… I can talk to a video game which could be arbitrarily violent, or I could talk to an AI,” said Casado in an interview with TechCrunch. “I think it’s an important question study. I will not be surprised if it turns out that [talking to AI] is actually preferable.”

Some companies may consider someone developing a loving relationship with your AI as a marker of success. But from a societal standpoint, it also could be seen as a marker of total failure, much like the movie “Her” tried to depict. That’s the tightrope that WaveForms now has to walk.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Klarna and Deel eye IPOs, and Stripe embraces crypto

Welcome to TechCrunch Fintech!  This week we’re looking at how fintech heavyweights such as Klarna and Stripe are...

Another person targeted by Paragon spyware comes forward

Another person who was allegedly targeted on WhatsApp with spyware made by Israeli company Paragon has come...

Apple reportedly partners with Alibaba after rejecting DeepSeek for China AI launch

According to a report published Tuesday by The Information, Apple is partnering Alibaba to bring its Apple...

Pinkfish helps enterprises build AI agents through natural language processing

As the chief product officer for AI customer service startup TalkDesk, Charayna “CK” Kannan said that enterprises...

3D mood board and marketplace Mattoboard picks up $2M to launch AI visual search

Mattoboard, the makers of web-based software designed to simplify the creative process for interior designers and architects,...

Australian health tech startup Harrison.ai scores $112M Series C

Medical imaging is crucial for the timely identification of serious diseases like cancer. However, manual interpretation of...

AI investments surged 62% to $110B in 2024 while startup funding declined 12%, says Dealroom

Venture capitalists are gobbling up term sheets for startups peddling artificial intelligence, but they’re remaining picky when...

Ghanaian fintech Affinity bags $8M to scale digital banking in a mobile money-market

Africa’s top digital banking platforms typically dominate high-growth, populous markets like Nigeria, South Africa, and Egypt. But...