TTT models might be the next frontier in generative AI

Date:

Share post:


After years of dominance by the form of AI known as the transformer, the hunt is on for new architectures.

Transformers underpin OpenAI’s video-generating model Sora, and they’re at the heart of text-generating models like Anthropic’s Claude, Google’s Gemini and GPT-4o. But they’re beginning to run up against technical roadblocks — in particular, computation-related roadblocks.

Transformers aren’t especially efficient at processing and analyzing vast amounts of data, at least running on off-the-shelf hardware. And that’s leading to steep and perhaps unsustainable increases in power demand as companies build and expand infrastructure to accommodate transformers’ requirements.

A promising architecture proposed this month is test-time training (TTT), which was developed over the course of a year and a half by researchers at Stanford, UC San Diego, UC Berkeley and Meta. The research team claims that TTT models can not only process far more data than transformers, but that they can do so without consuming nearly as much compute power.

The hidden state in transformers

A fundamental component of transformers is the “hidden state,” which is essentially a long list of data. As a transformer processes something, it adds entries to the hidden state to “remember” what it just processed. For instance, if the model is working its way through a book, the hidden state values will be things like representations of words (or parts of words).

“If you think of a transformer as an intelligent entity, then the lookup table — its hidden state — is the transformer’s brain,” Yu Sun, a post-doc at Stanford and a co-contributor on the TTT research, told TechCrunch. “This specialized brain enables the well-known capabilities of transformers such as in-context learning.”

The hidden state is part of what makes transformers so powerful. But it also hobbles them. To “say” even a single word about a book a transformer just read, the model would have to scan through its entire lookup table — a task as computationally demanding as rereading the whole book.

So Sun and team had the idea of replacing the hidden state with a machine learning model — like nested dolls of AI, if you will, a model within a model.

It’s a bit technical, but the gist is that the TTT model’s internal machine learning model, unlike a transformer’s lookup table, doesn’t grow and grow as it processes additional data. Instead, it encodes the data it processes into representative variables called weights, which is what makes TTT models highly performant. No matter how much data a TTT model processes, the size of its internal model won’t change.

Sun believes that future TTT models could efficiently process billions of pieces of data, from words to images to audio recordings to videos. That’s far beyond the capabilities of today’s models.

“Our system can say X words about a book without the computational complexity of rereading the book X times,” Sun said. “Large video models based on transformers, such as Sora, can only process 10 seconds of video, because they only have a lookup table ‘brain.’ Our eventual goal is to develop a system that can process a long video resembling the visual experience of a human life.”

Skepticism around the TTT models

So will TTT models eventually supersede transformers? They could. But it’s too early to say for certain.

TTT models aren’t a drop-in replacement for transformers. And the researchers only developed two small models for study, making TTT as a method difficult to compare right now to some of the larger transformer implementations out there.

“I think it’s a perfectly interesting innovation, and if the data backs up the claims that it provides efficiency gains then that’s great news, but I couldn’t tell you if it’s better than existing architectures or not,” said Mike Cook, a senior lecturer in King’s College London’s department of informatics who wasn’t involved with the TTT research. “An old professor of mine used to tell a joke when I was an undergrad: How do you solve any problem in computer science? Add another layer of abstraction. Adding a neural network inside a neural network definitely reminds me of that.”

Regardless, the accelerating pace of research into transformer alternatives points to growing recognition of the need for a breakthrough.

This week, AI startup Mistral released a model, Codestral Mamba, that’s based on another alternative to the transformer called state space models (SSMs). SSMs, like TTT models, appear to be more computationally efficient than transformers and can scale up to larger amounts of data.

AI21 Labs is also exploring SSMs. So is Cartesia, which pioneered some of the first SSMs and Codestral Mamba’s namesakes, Mamba and Mamba-2.

Should these efforts succeed, it could make generative AI even more accessible and widespread than it is now — for better or worse.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

You have a few hours left to bid on this burned-out husk in San Francisco

Houses in San Francisco are notoriously expensive, with the average home price hovering around $1.26 million. It’s...

Ben Ling’s Bling Capital has already nabbed another $270M for fourth fund

Bling Capital, one of the more prolific and well-connected seed VC firms, has nabbed another $270 million...

SpaceX launches Starship for the sixth time – with Trump on site to watch

SpaceX conducted the sixth flight test of its massive Starship rocket on Tuesday afternoon, and although the...

Apple says Mac users targeted in zero-day cyberattacks

Apple released security updates on Tuesday that it says are “recommended for all users,” after fixing a...

PayPal revives its money-pooling feature

Welcome to TechCrunch Fintech! This week, we’re diving into PayPal’s new holiday shopping-friendly feature, Klarna’s 2025 IPO...

PSA: You shouldn’t upload your medical images to AI chatbots

Here’s a quick reminder before you get on with your day: Think twice before you upload your...

Kim Kardashian has befriended Optimus, the Tesla bot

Pete Davidson? Kanye West? Step aside. Kim Kardashian’s new beaux is a Tesla bot named Optimus. The fashion...

VW taps former Rivian exec to run US business

Volkswagen of America has a new CEO: Rivian’s recently departed chief commercial officer Kjell Gruner. The appointment comes...