TTT models might be the next frontier in generative AI

After years of dominance by the form of AI known as the transformer, the hunt is on for new architectures.

Transformers underpin OpenAI’s video-generating model Sora, and they’re at the heart of text-generating models like Anthropic’s Claude, Google’s Gemini and GPT-4o. But they’re beginning to run up against technical roadblocks — in particular, computation-related roadblocks.

Transformers aren’t especially efficient at processing and analyzing vast amounts of data, at least running on off-the-shelf hardware. And that’s leading to steep and perhaps unsustainable increases in power demand as companies build and expand infrastructure to accommodate transformers’ requirements.

A promising architecture proposed this month is test-time training (TTT), which was developed over the course of a year and a half by researchers at Stanford, UC San Diego, UC Berkeley and Meta. The research team claims that TTT models can not only process far more data than transformers, but that they can do so without consuming nearly as much compute power.

The hidden state in transformers

A fundamental component of transformers is the “hidden state,” which is essentially a long list of data. As a transformer processes something, it adds entries to the hidden state to “remember” what it just processed. For instance, if the model is working its way through a book, the hidden state values will be things like representations of words (or parts of words).

“If you think of a transformer as an intelligent entity, then the lookup table — its hidden state — is the transformer’s brain,” Yu Sun, a post-doc at Stanford and a co-contributor on the TTT research, told TechCrunch. “This specialized brain enables the well-known capabilities of transformers such as in-context learning.”

The hidden state is part of what makes transformers so powerful. But it also hobbles them. To “say” even a single word about a book a transformer just read, the model would have to scan through its entire lookup table — a task as computationally demanding as rereading the whole book.

So Sun and team had the idea of replacing the hidden state with a machine learning model — like nested dolls of AI, if you will, a model within a model.

It’s a bit technical, but the gist is that the TTT model’s internal machine learning model, unlike a transformer’s lookup table, doesn’t grow and grow as it processes additional data. Instead, it encodes the data it processes into representative variables called weights, which is what makes TTT models highly performant. No matter how much data a TTT model processes, the size of its internal model won’t change.

Sun believes that future TTT models could efficiently process billions of pieces of data, from words to images to audio recordings to videos. That’s far beyond the capabilities of today’s models.

“Our system can say X words about a book without the computational complexity of rereading the book X times,” Sun said. “Large video models based on transformers, such as Sora, can only process 10 seconds of video, because they only have a lookup table ‘brain.’ Our eventual goal is to develop a system that can process a long video resembling the visual experience of a human life.”

Skepticism around the TTT models

So will TTT models eventually supersede transformers? They could. But it’s too early to say for certain.

TTT models aren’t a drop-in replacement for transformers. And the researchers only developed two small models for study, making TTT as a method difficult to compare right now to some of the larger transformer implementations out there.

“I think it’s a perfectly interesting innovation, and if the data backs up the claims that it provides efficiency gains then that’s great news, but I couldn’t tell you if it’s better than existing architectures or not,” said Mike Cook, a senior lecturer in King’s College London’s department of informatics who wasn’t involved with the TTT research. “An old professor of mine used to tell a joke when I was an undergrad: How do you solve any problem in computer science? Add another layer of abstraction. Adding a neural network inside a neural network definitely reminds me of that.”

Regardless, the accelerating pace of research into transformer alternatives points to growing recognition of the need for a breakthrough.

This week, AI startup Mistral released a model, Codestral Mamba, that’s based on another alternative to the transformer called state space models (SSMs). SSMs, like TTT models, appear to be more computationally efficient than transformers and can scale up to larger amounts of data.

AI21 Labs is also exploring SSMs. So is Cartesia, which pioneered some of the first SSMs and Codestral Mamba’s namesakes, Mamba and Mamba-2.

Should these efforts succeed, it could make generative AI even more accessible and widespread than it is now — for better or worse.

Source link

TTT models might be the next frontier in generative AI

The hidden state in transformers

Skepticism around the TTT models

Recent posts

Meta, Universal Music Group addresses AI music in new licensing agreement

Apple’s newly available ‘win-back’ offers let developers reach lapsed subscribers

What we know about CrowdStrike’s update fail that’s causing global outages and travel chaos

Letta, one of UC Berkeley’s most anticipated AI startups, has just come out of stealth

Beeble AI raises $4.75M to launch a virtual production platform for indie filmmakers

nOps lands $30M to optimize AWS customers’ cloud spend

Texas sues GM, saying it tricked customers into sharing driving data sold to insurers

The White House joins Reddit and shares hurricane information

ASL Aspire wants to gamify STEM education for deaf kids

Xbox will soon let players buy games directly in its Android app following Google antitrust ruling

Students and recent grads: 5 days left to save on TechCrunch Disrupt 2024 tickets

A walk through the crypto jungle at Korea Blockchain Week

Google’s NotebookLM enhances AI note-taking with YouTube, audio file sources, sharable audio discussions

The EU’s 10 biggest antitrust actions on tech

General Catalyst raises $8B in fresh funds to back startups globally

Related articles

After causing outrage on the first day of Y Combinator, AI code editor PearAI lands $1M seed

Third member of LockBit ransomware gang has been arrested

Feds clear the way for robotaxis without steering wheels and pedals

Nvidia clears regulatory hurdle to acquire Run:ai

Google is expanding Gemini’s in-depth research mode to 40 languages

Swizzle Fund raises $5M for inaugural fund addressing women’s health and wealth

Lerer Hippeau files to raise Fund IX

Ransomware attack on health giant Ascension hits 5.6 million patients

Company

Follow us