Making AI models ‘forget’ undesirable data hurts their performance

Date:

Share post:


So-called “unlearning” techniques are used to make a generative AI model forget specific and undesirable info it picked up from training data, like sensitive private data or copyrighted material.

But current unlearning techniques are a double-edged sword: They could make a model like OpenAI’s GPT-4o or Meta’s Llama 3.1 405B much less capable of answering basic questions.

That’s according to a new study co-authored by researchers at the University of Washington (UW), Princeton, the University of Chicago, USC and Google, which found that the most popular unlearning techniques today tend to degrade models — often to the point where they’re unusable.

“Our evaluation suggests that currently feasible unlearning methods are not yet ready for meaningful usage or deployment in real-world scenarios,” Weijia Shi, a researcher on the study and a Ph.D. candidate in computer science at UW, told TechCrunch. “Currently, there are no efficient methods that enable a model to forget specific data without considerable loss of utility.”

How models learn

Generative AI models have no real intelligence. They’re statistical systems that predict words, images, speech, music, videos and other data. Fed an enormous number of examples (e.g. movies, voice recordings, essays and so on), AI models learn how likely data is to occur based on patterns, including the context of any surrounding data.

Given an email ending in the fragment “Looking forward…”, for example, a model trained to autocomplete messages might suggest “… to hearing back,” following the pattern of all the emails it’s ingested. There’s no intentionality there; the model isn’t looking forward to anything. It’s simply making an informed guess.

Most models, including flagships like GPT-4o, are trained on data sourced from public websites and data sets around the web. Most vendors developing such models argue that fair use shields their practice of scraping data and using it for training without informing, compensating or even crediting the data’s owners.

But not every copyright holder agrees. And many — from authors to publishers to record labels — have filed lawsuits against vendors to force a change.

The copyright dilemma is one of the reasons unlearning techniques have gained a lot of attention lately. Google, in partnership with several academic institutions, last year launched a competition seeking to spur the creation of new unlearning approaches.

Unlearning could also provide a way to remove sensitive info from existing models, like medical records or compromising photos, in response to a request or government order. (Thanks to the way they’re trained, models tend to sweep up lots of private information, from phone numbers to more problematic examples.) Over the past few years, some vendors have rolled out tools to allow data owners to ask that their data be removed from training sets. But these opt-out tools only apply to future models, not models trained before they rolled out; unlearning would be a much more thorough approach to data deletion.

Regardless, unlearning isn’t as easy as hitting “Delete.”

The art of forgetting

Unlearning techniques today rely on algorithms designed to “steer” models away from the data to be unlearned. The idea is to influence the model’s predictions so that it never — or only very rarely — outputs certain data.

To see how effective these unlearning algorithms could be, Shi and her collaborators devised a benchmark and selected eight different open algorithms to test. Called MUSE (Machine Unlearning Six-way Evaluation), the benchmark aims to probe an algorithm’s ability to not only prevent a model from spitting out training data verbatim (a phenomenon known as regurgitation), but eliminate the model’s knowledge of that data along with any evidence that it was originally trained on the data.

Scoring well on MUSE requires making a model forget two things: books from the Harry Potter series and news articles.

For example, given a snippet from Harry Potter and The Chamber of Secrets (“‘There’s more in the frying pan,’ said Aunt…”), MUSE tests whether an unlearned model can recite the whole sentence (“‘There’s more in the frying pan,’ said Aunt Petunia, turning eyes on her massive son”), answer questions about the scene (e.g. “What does Aunt Petunia tell her son?”, “More in the frying pan”) or otherwise indicate it’s been trained on text from the book.

MUSE also tests whether the model retained related general knowledge — e.g. that J.K. Rowling is the author of the Harry Potter series — after unlearning, which the researchers refer to as the model’s overall utility. The lower the utility, the more related knowledge the model lost, making the model less able to correctly answer questions.

In their study, the researchers found that the unlearning algorithms they tested did make models forget certain information. But they also hurt the models’ general question-answering capabilities, presenting a trade-off.

“Designing effective unlearning methods for models is challenging because knowledge is intricately entangled in the model,” Shi explained. “For instance, a model may be trained on copyrighted material — Harry Potter books as well as on freely available content from the Harry Potter Wiki. When existing unlearning methods attempt to remove the copyrighted Harry Potter books, they significantly impact the model’s knowledge about the Harry Potter Wiki, too.”

Are there any solutions to the problem? Not yet — and this highlights the need for additional research, Shi said.

For now, vendors betting on unlearning as a solution to their training data woes appear to be out of luc. Perhaps a technical breakthrough will make unlearning feasible someday. But for the time being, vendors will have to find another way to prevent their models from saying things they shouldn’t.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

How a digital “you” can sit through your agonizing web conference calls

Now you can appear to be on a Zoom call in your office, even when you’re sipping...

‘Wolfs’ sequel canceled because director ‘no longer trusted’ Apple

It may be hard to remember, but George Clooney and Brad Pitt co-starred in a movie, “Wolfs,”...

DOJ tells Google to sell Chrome

Welcome back to Week in Review. This week, we’re exploring the DOJ telling Google to sell off...

Tesla says it has reached a ‘conditional’ settlement in Rivian trade secrets lawsuit

Tesla and Rivian may have resolved a lawsuit in which Tesla accused Rivian of poaching employees and...

The rise and fall of the ‘Scattered Spider’ hackers

After evading capture for more than two years following a hacking spree that targeted some of the...

Trump’s tariff threats don’t scare this Mexican fintech

Mexico’s economic development — turbocharged by the amount of nearshoring in recent years — has made it...

Meet three incoming EU lawmakers in charge of key tech policy areas

The European Union looks to have clinched political agreement on the team of 26 commissioners who will...

OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit (updated)

Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their...