Meta’s Movie Gen model puts out realistic video with sound, so we can finally have infinite Moo Deng

Date:

Share post:


No one really knows what generative video models are useful for just yet, but that hasn’t stopped companies like Runway, OpenAI, and Meta from pouring millions into developing them. Meta’s latest is called Movie Gen, and true to its name turns text prompts into relatively realistic video with sound… but thankfully no voice just yet. And wisely they are not giving this one a public release.

Movie Gen is actually a collection (or “cast” as they put it) of foundation models, the largest of which is the text-to-video bit. Meta claims it outperforms the likes of Runway’s Gen3, LumaLabs’ latest, and Kling1.5, though as always this type of thing is more to show that they are playing the same game than that Movie Gen wins. The technical particulars can be found in the paper Meta put out describing all the components.

Audio is generated to match the contents of the video, adding for instance engine noises that correspond with car movements, or the rush of a waterfall in the background, or a crack of thunder halfway through the video when it’s called for. It’ll even add music if that seems relevant.

It was trained on “a combination of licensed and publicly available datasets” that they called “proprietary/commercially sensitive” and would provide no further details on. We can only guess means is a lot of Instagram and Facebook videos, plus some partner stuff and a lot of others that are inadequately protected from scrapers — AKA “publicly available.”

What Meta is clearly aiming for here, however, is not simply capturing the “state of the art” crown for a month or two, but a practical, soup-to-nuts approach where a solid final product can be produced from a very simple, natural-language prompt. Stuff like “imagine me as a baker making a shiny hippo cake in a thunderstorm.”

For instance, one sticking point for these video generators has been in how difficult they usually are to edit. If you ask for a video of someone walking across the street, then realize you want them walking right to left instead of left to right, there’s a good chance the whole shot will look different when you repeat the prompt with that additional instruction. Meta is adding a simple, text-based editing method where you can simply say “change the background to a busy intersection” or “change her clothes to a red dress” and it will attempt to make that change, but only that change.

Image Credits:Meta

Camera movements are also generally understood, with things like “tracking shot” and “pan left” taken into account when generating the video. This is still pretty clumsy compared with real camera control, but it’s a lot better than nothing.

The limitations of the model are a little weird. It generates video 768 pixels wide, a dimension familiar to most from the famous but outdated 1024×768, but which is also three times 256, making it play well with other HD formats. The Movie Gen system upscales this to 1080p, which is the source of the claim that it generates that resolution. Not really true, but we’ll give them a pass because upscaling is surprisingly effective.

Weirdly, it generates up to 16 seconds of video… at 16 frames per second, a frame rate no one in history has ever wanted or asked for. You can, however, also do 10 seconds of video at 24 FPS. Lead with that one!

As for why it doesn’t do voice… well, there are likely two reasons. First, it’s super hard. Generating speech is easy now, but matching it to lip movements, and those lips to face movements, is a much more complicated proposition. I don’t blame them for leaving this one til later, since it would be a minute-one failure case. Someone could say “generate a clown delivering the Gettysburg Address while riding a tiny bike in circles” — nightmare fuel primed to go viral.

The second reason is likely political: putting out what amounts to a deepfake generator a month before a major election is… not the best for optics. Crimping its capabilities a bit so that, should malicious actors try to use it, it would require some real work on their part, is a practical preventive step. One certainly could combine this generative model with a speech generator and an open lip syncing one, but you can’t just have it generate a candidate making wild claims.

“Movie Gen is purely an AI research concept right now, and even at this early stage, safety is a top priority as it has been with all of our generative AI technologies,” said a Meta rep in response to TechCrunch’s questions.

Unlike, say, the Llama large language models, Movie Gen won’t be publicly available. You can replicate its techniques somewhat by following the research paper, but the code won’t be published, except for the “underlying evaluation prompt dataset,” which is to say the record of what prompts were used to generate the test videos.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

$132K – $149K, here’s what seed-stage founders pay early employees, based on data

Once a startup has raised its seed round, the perennial question becomes how much should the founders...

GV, the VC team backed by Google, has a broad remit, but it can’t do one thing

David Krane is in an enviable position. As the CEO of GV, the venture firm that is...

AMD’s CES 2025 press conference: How to watch

AMD has its work cut out for it at CES 2025. Competitor Nvidia has been sucking the...

Home for the holidays? Share this top cybersecurity advice with friends and family

For the millions of people at home with friends and family for the festive season, it’s also...

Onyx Motorbikes is back, one year after its owner died leaving the company in shambles

A year after Onyx Motorbikes owner James Khatiblou died suddenly, leaving customers with unfulfilled orders and millions...

Sony’s CES 2025 press conference: How to watch

Sony knows how to put on a show at CES. The company’s pressers are high octane, star-studded...

OpenAI ‘considered’ building a humanoid robot: Report

OpenAI has recently explored building its own humanoid robot, according to The Information. The report cites “two...

Samsung’s CES 2025 press conference: How to watch

Samsung’s CES presser is always an odd duck. The Korean electronics giant generally keeps its powder dry...