OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit

Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case.

Earlier this fall, OpenAI agreed to provide two virtual machines so counsel for The Times and Daily News could perform searches for copyrighted content in its training data sets. In a letter, attorneys for the publishers say that they and experts have spent over 150 hours since November 1 searching OpenAI’s training data.

But on November 14, OpenAI engineers erased all the publishers’ search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Thursday.

OpenAI tried recover the data — and was somewhat successful. However, because the folder structure and file names were “irretrievably” lost, the recovered data “cannot be used to determine where the news plaintiffs’ copied articles were used to build [OpenAI’s] models,” per the letter.

“News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” counsel for The Times and Daily News wrote. “The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”

The plaintiffs’ counsel makes clear that they have no reason to believe the deletion was intentional. But they do say the incident underscores that OpenAI “is in the best position to search its own datasets” for potentially infringing content using its own tools.

We’ve reached out to OpenAI for comment and will update this piece if we hear back.

In this case and others, OpenAI has maintained that training models using publicly available data — including articles from The Times and Daily News — is fair use. In other words, in creating models like GPT-4o, which “learn” from billions of examples of ebooks, essays, and more to generate human-sounding text, OpenAI believes that it isn’t required to license or otherwise pay for the examples — even if it makes money from those models.

That being said, OpenAI has inked licensing deals with a growing number of new publishers, including The Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp. OpenAI has declined to make the terms of these deals public, but one content partner, Dotdash, is reportedly being paid at least $16 million per year.

Source link

OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit

Recent posts

Large funds are too invested in the energy transition to turn back now

Anthropic’s next major AI model could arrive within weeks

Audio platform Pocket FM taps into AI tools help it expand content catalog

This Week in AI: OpenAI’s new Strawberry model may be smart, yet sluggish

ODD taps $27M for diamond chips to clear radioactive debris at Fukushima Daiichi Nuclear Power Plant

Sunamp’s thermal battery uses a chemical found in salt-and-vinegar potato chips

TechCrunch Disrupt 2025 tickets now on sale: Lowest rates ever

Former watch trader is now building the AWS of grid storage, Terralayr

How DeepSeek’s efficient AI could stall the nuclear renaissance

Zepto raises another $350 million amid retail upheaval in India

ElectronX is building a stock market for electricity trading

The coolest startup in the Bay Area is a baseball team called the Oakland Ballers

Perplexity brings ads to its platform

You.com ‘refocuses’ from AI search to deeper productivity agents with new $50M round

Google brings ads to AI Overviews as it expands AI’s role in search

Related articles

Norway’s 1X is building a humanoid robot for the home

Fintech founder Charlie Javice’s criminal trial has begun

The Vision Pro is getting Apple Intelligence in April

How automotive exec Crystal Brown founded CircNova, an AI drug discovery biotech

Apply to Speak at TechCrunch Sessions: AI before the deadline

Three reasons every founder and VC should be at TechCrunch All Stage 2025

OpenAI rolls out its AI agent, Operator, in several countries

Rivian will launch hands-off highway driver assist ‘in a few weeks’

Company

Follow us