What are AI ‘world models,’ and why do they matter?

Date:

Share post:


World models, also known as world simulators, are being touted by some as the next big thing in AI.

AI pioneer Fei-Fei Li’s World Labs has raised $230 million to build “large world models,” and DeepMind hired one of the creators of OpenAI’s video generator, Sora, to work on “world simulators.”

But what the heck are these things?

World models take inspiration from the mental models of the world that humans develop naturally. Our brains take the abstract representations from our senses and form them into more concrete understanding of the world around us, producing what we called “models” long before AI adopted the phrase. The predictions our brains make based on these models influence how we perceive the world.

A paper by AI researchers David Ha and Jurgen Schmidhuber gives the example of a baseball batter. Batters have milliseconds to decide how to swing their bat — shorter than the time it takes for visual signals to reach the brain. The reason they’re able to hit a 100-mile-per-hour fastball is because they can instinctively predict where the ball will go, Ha and Schmidhuber say.

“For professional players, this all happens subconsciously,” the research duo writes. “Their muscles reflexively swing the bat at the right time and location in line with their internal models’ predictions. They can quickly act on their predictions of the future without the need to consciously roll out possible future scenarios to form a plan.”

It’s these subconscious reasoning aspects of world models that some believe are prerequisites for human-level intelligence.

Modeling the world

While the concept has been around for decades, world models have gained popularity recently in part because of their promising applications in the field of generative video.

Most, if not all, AI-generated videos veer into uncanny valley territory. Watch them long enough and something bizarre will happen, like limbs twisting and merging into each other.

While a generative model trained on years of video might accurately predict that a basketball bounces, it doesn’t actually have any idea why — just like language models don’t really understand the concepts behind words and phrases. But a world model with even a basic grasp of why the basketball bounces like it does will be better at showing it do that thing.

To enable this kind of insight, world models are trained on a range of data, including photos, audio, videos, and text, with the intent of creating internal representations of how the world works, and the ability to reason about the consequences of actions.

A sample from AI startup Runway’s Gen-3 video generation model. Image Credits:Runway

“A viewer expects that the world they’re watching behaves in a similar way to their reality,” Mashrabov said. “If a feather drops with the weight of an anvil or a bowling ball shoots up hundreds of feet into the air, it’s jarring and takes the viewer out of the moment. With a strong world model, instead of a creator defining how each object is expected to move — which is tedious, cumbersome, and a poor use of time — the model will understand this.”

But better video generation is only the tip of the iceberg for world models. Researchers including Meta chief AI scientist Yann LeCun say the models could someday be used for sophisticated forecasting and planning in both the digital and physical realm.

In a talk earlier this year, LeCun described how a world model could help achieve a desired goal through reasoning. A model with a base representation of a “world” (e.g. a video of a dirty room), given an objective (a clean room), could come up with a sequence of actions to achieve that objective (deploy vacuums to sweep, clean the dishes, empty the trash) not because that’s a pattern it has observed but because it knows at a deeper level how to go from dirty to clean.

“We need machines that understand the world; [machines] that can remember things, that have intuition, have common sense — things that can reason and plan to the same level as humans,” LeCun said. “Despite what you might have heard from some of the most enthusiastic people, current AI systems are not capable of any of this.”

While LeCun estimates that we’re at least a decade away from the world models he envisions, today’s world models are showing promise as elementary physics simulators.

OpenAI Sora Minecraft
Sora controlling a player in Minecraft — and rendering the world. Image Credits:OpenAI

OpenAI notes in a blog that Sora, which it considers to be a world model, can simulate actions like a painter leaving brush strokes on a canvas. Models like Sora — and Sora itself — can also effectively simulate video games. For example, Sora can render a Minecraft-like UI and game world.

Future world models may be able to generate 3D worlds on demand for gaming, virtual photography, and more, World Labs co-founder Justin Johnson said on an episode of the a16z podcast.

“We already have the ability to create virtual, interactive worlds, but it costs hundreds and hundreds of millions of dollars and a ton of development time,” Johnson said. “[World models] will let you not just get an image or a clip out, but a fully simulated, vibrant, and interactive 3D world.”

High hurdles

While the concept is enticing, many technical challenges stand in the way.

Training and running world models requires massive compute power even compared to the amount currently used by generative models. While some of the latest language models can run on a modern smartphone, Sora (arguably an early world model) would require thousands of GPUs to train and run, especially if their use becomes commonplace.

World models, like all AI models, also hallucinate — and internalize biases in their training data. A world model trained largely on videos of sunny weather in European cities might struggle to comprehend or depict Korean cities in snowy conditions, for example, or simply do so incorrectly.

A general lack of training data threatens to exacerbate these issues, says Mashrabov.

“We have seen models being really limited with generations of people of a certain type or race,” he said. “Training data for a world model must be broad enough to cover a diverse set of scenarios, but also highly specific to where the AI can deeply understand the nuances of those scenarios.”

In a recent post, AI startup Runway’s CEO, Cristóbal Valenzuela, says that data and engineering issues prevent today’s models from accurately capturing the behavior of a world’s inhabitants (e.g. humans and animals). “Models will need to generate consistent maps of the environment,” he said, “and the ability to navigate and interact in those environments.”

OpenAI Sora
A Sora-generated video. Image Credits:OpenAI

If all the major hurdles are overcome, though, Mashrabov believes that world models could “more robustly” bridge AI with the real world — leading to breakthroughs not only in virtual world generation but robotics and AI decision-making.

They could also spawn more capable robots.

Robots today are limited in what they can do because they don’t have an awareness of the world around them (or their own bodies). World models could give them that awareness, Mashrabov said — at least to a point.

“With an advanced world model, an AI could develop a personal understanding of whatever scenario it’s placed in,” he said, “and start to reason out possible solutions.”



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

OpenAI’s GPT-5 reportedly falling short of expectations

OpenAI’s efforts to develop its next major model, GPT-5, are running behind schedule, with results that don’t...

OpenAI announces new o3 model — but you can’t use it yet

Welcome back to Week in Review. This week, we’re looking at OpenAI’s last — and biggest —...

Google pushes back against DOJ’s ‘interventionist’ remedies in antitrust case

Google has offered up its own proposal in a recent antitrust case that saw the US Department...

If climate tech is dead, what comes next?

Humans have an innate desire to name things, but to be honest, we’re not always that good...

Hollywood angels: Here are the celebrities who are also star VCs

Becoming a venture capitalist has become the latest status symbol in Hollywood.  Everyone these days, from Olivia Wilde...

Meet Skyseed, a VC fund and incubator backing the Bluesky and AT Protocol ecosystem

On November 15, Peter Wang posted a message requesting ideas for a new incubator and fund to...

Sam Altman disputes Marc Andreessen’s description of AI meetings with Biden administration

Famed investor Marc Andreessen recently talked about meetings with Biden administration staff who gave him the impression...

EV startup Canoo places remaining employees on a ‘mandatory unpaid break’

Struggling electric van startup Canoo has placed its remaining employees on what it’s calling a “mandatory unpaid...