Meta’s Movie Gen model puts out realistic video with sound, so we can finally have infinite Moo Deng

No one really knows what generative video models are useful for just yet, but that hasn’t stopped companies like Runway, OpenAI, and Meta from pouring millions into developing them. Meta’s latest is called Movie Gen, and true to its name turns text prompts into relatively realistic video with sound… but thankfully no voice just yet. And wisely they are not giving this one a public release.

Movie Gen is actually a collection (or “cast” as they put it) of foundation models, the largest of which is the text-to-video bit. Meta claims it outperforms the likes of Runway’s Gen3, LumaLabs’ latest, and Kling1.5, though as always this type of thing is more to show that they are playing the same game than that Movie Gen wins. The technical particulars can be found in the paper Meta put out describing all the components.

Audio is generated to match the contents of the video, adding for instance engine noises that correspond with car movements, or the rush of a waterfall in the background, or a crack of thunder halfway through the video when it’s called for. It’ll even add music if that seems relevant.

It was trained on “a combination of licensed and publicly available datasets” that they called “proprietary/commercially sensitive” and would provide no further details on. We can only guess means is a lot of Instagram and Facebook videos, plus some partner stuff and a lot of others that are inadequately protected from scrapers — AKA “publicly available.”

What Meta is clearly aiming for here, however, is not simply capturing the “state of the art” crown for a month or two, but a practical, soup-to-nuts approach where a solid final product can be produced from a very simple, natural-language prompt. Stuff like “imagine me as a baker making a shiny hippo cake in a thunderstorm.”

For instance, one sticking point for these video generators has been in how difficult they usually are to edit. If you ask for a video of someone walking across the street, then realize you want them walking right to left instead of left to right, there’s a good chance the whole shot will look different when you repeat the prompt with that additional instruction. Meta is adding a simple, text-based editing method where you can simply say “change the background to a busy intersection” or “change her clothes to a red dress” and it will attempt to make that change, but only that change.

Image Credits:Meta

Camera movements are also generally understood, with things like “tracking shot” and “pan left” taken into account when generating the video. This is still pretty clumsy compared with real camera control, but it’s a lot better than nothing.

The limitations of the model are a little weird. It generates video 768 pixels wide, a dimension familiar to most from the famous but outdated 1024×768, but which is also three times 256, making it play well with other HD formats. The Movie Gen system upscales this to 1080p, which is the source of the claim that it generates that resolution. Not really true, but we’ll give them a pass because upscaling is surprisingly effective.

Weirdly, it generates up to 16 seconds of video… at 16 frames per second, a frame rate no one in history has ever wanted or asked for. You can, however, also do 10 seconds of video at 24 FPS. Lead with that one!

As for why it doesn’t do voice… well, there are likely two reasons. First, it’s super hard. Generating speech is easy now, but matching it to lip movements, and those lips to face movements, is a much more complicated proposition. I don’t blame them for leaving this one til later, since it would be a minute-one failure case. Someone could say “generate a clown delivering the Gettysburg Address while riding a tiny bike in circles” — nightmare fuel primed to go viral.

The second reason is likely political: putting out what amounts to a deepfake generator a month before a major election is… not the best for optics. Crimping its capabilities a bit so that, should malicious actors try to use it, it would require some real work on their part, is a practical preventive step. One certainly could combine this generative model with a speech generator and an open lip syncing one, but you can’t just have it generate a candidate making wild claims.

“Movie Gen is purely an AI research concept right now, and even at this early stage, safety is a top priority as it has been with all of our generative AI technologies,” said a Meta rep in response to TechCrunch’s questions.

Unlike, say, the Llama large language models, Movie Gen won’t be publicly available. You can replicate its techniques somewhat by following the research paper, but the code won’t be published, except for the “underlying evaluation prompt dataset,” which is to say the record of what prompts were used to generate the test videos.

Source link

Meta’s Movie Gen model puts out realistic video with sound, so we can finally have infinite Moo Deng

Recent posts

Are Cybertrucks too angular for Europe?

UK plans to ban public sector organizations from paying ransomware hackers

Roon raises $15M to replace ‘Dr. Google’ with real doctors sharing videos about illness treatments

Google ships first developer preview Android 16 to speed up feature roll outs

Uber is piloting accounts for teenagers in India

Meta Connect 2024: Orion glasses, Quest 3S headset, Meta AI upgrades, Ray-Ban Meta real-time video, and more

Meta pitches VR to mobile developers with new support for Android apps on Quest

PayPal revives its money-pooling feature

Merlin Solar bets twisty panels will help it land on rooftops everywhere

Meta fined $263M over 2018 security breach that affected ~3M EU users

Fintech Rapyd seeks funding at $3.5B valuation, a steep drop from $9B

M&As and AI are in the spotlight, but there’s still capital left for quick commerce and more

Google-backed Pixxel launches India’s first private satellite constellation

Sam Altman and Arianna Huffington’s Thrive AI Health assistant has a bare-bones demo

Neurode wants to treat and track ADHD symptoms through a wearable headband

Related articles

The author of SB 1047 introduces a new AI bill in California

TechCrunch Sessions: AI speaker applications close March 7

Podcasting platform Podcastle launches a text-to-speech model with more than 450 AI voices

Google upgrades Colab with an AI agent tool

Anthropic raises $3.5B to fuel its AI ambitions

US said to halt offensive cyber operations against Russia

Chinese buyers are getting Nvidia Blackwell chips despite U.S. export controls

As Skype shuts down, its legacy is end-to-end encryption for the masses

Company

Follow us