DeepMind’s new AI generates soundtracks and dialogue for videos

DeepMind, Google’s AI research lab, says it’s developing AI tech to generate soundtracks for videos.

In a post on its official blog, DeepMind says that it sees the tech, V2A (short for “video-to-audio”), as an essential piece of the AI-generated media puzzle. While plenty of orgs, including DeepMind, have developed video-generating AI models, these models can’t create sound effects to sync with the videos that they generate.

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output,” DeepMind writes. “V2A technology [could] become a promising approach for bringing generated movies to life.”

DeepMind’s V2A tech takes the description of a soundtrack (e.g. “jellyfish pulsating under water, marine life, ocean”) paired with a video to create music, sound effects and even dialogue that matches the characters and tone of the video, watermarked by DeepMind’s deepfakes-combating SynthID technology. The AI model powering V2A, a diffusion model, was trained on a combination of sounds and dialogue transcripts as well as video clips, DeepMind says.

“By training on video, audio and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts,” according to DeepMind.

Mum’s the word on whether any of the training data was copyrighted — and whether the data’s creators were informed of DeepMind’s work. We’ve reached out to DeepMind for clarification and will update this post if we hear back.

AI-powered sound-generating tools aren’t novel. Startup Stability AI released one just last week, and ElevenLabs launched one in May. Nor are models to create video sound effects. A Microsoft project can generate talking and singing videos from a still image, and platforms like Pika and GenreX have trained models to take a video and make a best guess at what music or effects are appropriate in a given scene.

But DeepMind claims that its V2A tech is unique in that it can understand the raw pixels from a video and sync generated sounds with the video automatically, optionally sans description.

V2A isn’t perfect, and DeepMind acknowledges this. Because the underlying model wasn’t trained on a lot of videos with artifacts or distortions, it doesn’t create particularly high-quality audio for these. And in general, the generated audio isn’t super convincing; my colleague Natasha Lomas described it as “a smorgasbord of stereotypical sounds,” and I can’t say I disagree.

For those reasons, and to prevent misuse, DeepMind says it won’t release the tech to the public anytime soon, if ever.

“To make sure our V2A technology can have a positive impact on the creative community, we’re gathering diverse perspectives and insights from leading creators and filmmakers, and using this valuable feedback to inform our ongoing research and development,” DeepMind writes. “Before we consider opening access to it to the wider public, our V2A technology will undergo rigorous safety assessments and testing.”

DeepMind pitches its V2A technology as an especially useful tool for archivists and folks working with historical footage. But generative AI along these lines also threatens to upend the film and TV industry. It’ll take some seriously strong labor protections to ensure that generative media tools don’t eliminate jobs — or, as the case may be, entire professions.

Source link

DeepMind’s new AI generates soundtracks and dialogue for videos

Recent posts

These startups are trying to prevent another CrowdStrike-like outage, according to VCs

Flying through Seattle’s hacked airport

Volkswagen’s Silicon Valley software hub is already stacked with Rivian talent

Meta offers a glimpse through its supposed iPhone killer: Orion

TechCrunch Minute: Everything you need to know about iOS 18

Security bugs in ransomware leak sites helped save six companies from paying hefty ransoms

CSC ServiceWorks reveals 2023 data breach affecting thousands of people

Max Space reinvents expandable habitats with a 17th-century twist, launching in 2026

Meta’s Yann LeCun says worries about A.I.’s existential threat are ‘complete B.S.’

Microsoft to face higher competition scrutiny in Germany, including over its use of AI

Ray-Ban Meta + facial recognition = Terminator vision for doxxing

A startup from ex-Revolut employees uses AI to automate accounts — but hopes to keep accountants in jobs

Tempus soars 15% on the first day of trading, demonstrating investor appetite for a health tech with a promise of AI

Newsletter writer covering Evolve Bank’s data breach says the bank sent him a cease and desist letter

Former OpenAI CTO Mira Murati is reportedly fundraising for a new AI startup

Related articles

Battery unicorn Northvolt files for bankruptcy, upending Europe’s industrial plan

Brave Search adds AI chat for follow-up questions after your initial query

Cruise fesses up, Pony AI raises its IPO ambitions, and the TuSimple drama dials back up

WhatsApp rolls out voice message transcripts

Threads adjusts its algorithm to show you more content from accounts you follow

Spotify tests a video feature for audiobooks as it ramps up video expansion

Candela brings its P-12 electric ferry to Tahoe and adds another $14M to build more

OneRail’s software helps solve the last-mile delivery problem

Company

Follow us