In AI copyright case, Zuckerberg turns to YouTube for his defense

Meta CEO Mark Zuckerberg appears to have used YouTube and its battle to take down pirated content to defend his own company’s use of a data set containing copyrighted e-books to train AI models, newly released snippets of his deposition reveals.

The deposition, which was part of a complaint submitted to the court by plaintiffs’ attorneys, is related to AI copyright case Kadrey v. Meta. It’s one of many such cases winding through the U.S. court system that’s pitting AI companies against authors and other IP holders. For the most part, the defendants in these cases – AI companies – claim that training on copyrighted content is “fair use.” Many copyright holders disagree.

“For example, YouTube, I think, may end up hosting some stuff that people pirate for some period of time, but YouTube is trying to take that stuff down,” Zuckerberg said during his deposition, according to portions of a transcript made available Wednesday night. “And the vast majority of the stuff on YouTube, I would assume, is kind of good and they have the license to do.”

Snippets from Zuckerberg’s deposition provide some clues of Zuckerberg’s thinking on copyright content and fair use. However, it should be noted that a full transcript of the deposition was not released. TechCrunch has reached out to Meta for additional context and will update the article if the company responds.

Based on the deposition nuggets, Zuckerberg appears to be defending Meta’s use of a training data set of e-books called LibGen to develop its family of AI models known as Llama. Meta’s Llama competes against flagship models from AI companies like OpenAI.

LibGen, which describes itself as a “links aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued a number of times, ordered to shut down, and fined tens of millions of dollars for copyright infringement.

According to court filings unsealed this week, Zuckerberg allegedly cleared the use of LibGen to train at least one of Meta’s Llama models despite concerns within the company’s AI exec and research teams over the legal implications.

Counsel for the plaintiffs, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, quoted Meta employees as referring to LibGen as a “data set we know to be pirated” and flagging that its use “may undermine [Meta’s] negotiating position with regulators,” according to a legal filing,

During his deposition, Zuckerberg claimed he “hadn’t really heard of” LibGen.

“I get that you’re trying to get me to give an opinion of LibGen, which I haven’t really heard of,” said Zuckerberg during the deposition. “It’s just that I don’t have knowledge of that specific thing.”

Under questioning from one of the plaintiffs’ attorneys, David Boies, Zuckerberg explained why it would be unreasonable to prohibit using a data set like LibGen.

“So would I want to have a policy against people using YouTube because some of the content may be copyrighted? No,” he said. “[T]here are cases where having such a blanket ban might not be the right thing to do.”

Zuckerberg did state that Meta should be “pretty careful about” training on copyrighted material.

“You know, [if there’s] someone who’s providing a website and they’re intentionally trying to violate people’s rights … obviously it’s something that we would want to be cautious about or careful about how we engaged with it or maybe even prevent our teams from engaging with it,” Zuckerberg said during his deposition, according to the transcript.

New allegations

Plaintiffs’ lawyers in the Kadrey v. Meta case have amended the complaint several times since it was filed in United States District Court for the Northern District of California, San Francisco Division in 2023. The latest amended complaint filed by plaintiffs’ counsel late Wednesday contains new allegations against Meta, including that the company cross-referenced certain pirated books in LibGen with copyrighted books available for license. Lawyers allege Meta used this tactic to determine whether it made sense to pursue a licensing agreement with a publisher.

Meta allegedly used LibGen to train its latest family of Llama models, Llama 3, per the amended filing. Plaintiffs also allege that Meta is using the data set to train its next-gen Llama 4 models.

According to the amended filing, Meta researchers allegedly tried to hide the fact that Llama models were trained on copyrighted materials by inserting “supervised samples” into Llama’s fine-tuning. And Meta downloaded pirated e-books from another source, Z-Library, for Llama training as recently as April 2024, the amended complaint alleges.

Z-Library, or Z-Lib, has been the subject of a number of legal actions brought by publishers, including domain seizures and takedowns. In 2022, the Russian nationals who allegedly maintained it were charged with copyright infringement, wire fraud, and money laundering.

Source link

In AI copyright case, Zuckerberg turns to YouTube for his defense

New allegations

Recent posts

The White House joins Reddit and shares hurricane information

Just 4 days left to save up to $600 on TechCrunch Disrupt 2024 passes

SpaceX Starship: Everything you’ve ever wondered but were afraid to ask

Uber cozies up to more AV companies, Canoo loses another founder and Waymo sees potential in teen riders

Pinterest rolls out genAI tools for product imagery to advertisers

Zeekr RT, the robotaxi built for Waymo, has the tiniest wipers

The flat-rate real estate startup that’s got big players worried and BNPL’s turning a corner

Venture debt lenders will play a big role in fire sales and startup shutdown this year, experts say

Walmart completes $2.3B acquisition of Vizio to help grow its ad business

MobiKwik slashes valuation by 73% in India IPO

OpenAI hires its first chief economist

OpenAI CEO Sam Altman calls Musk’s bid an attempt to ‘slow us down’

Oxylus Energy strikes “beautiful balance” to make e-fuels for aviation and shipping

Nvidia’s CES 2025 keynote: How to watch

OpenAI ‘considered’ building a humanoid robot: Report

Related articles

Meta, X approved ads containing violent anti-Muslim, antisemitic hate speech ahead of German election, study finds

Court filings show Meta staffers discussed using copyrighted content for AI training

Brian Armstrong says Coinbase spent $50M fighting SEC lawsuit – and beat it

iOS 18.4 will bring Apple Intelligence-powered ‘Priority Notifications’

Nvidia CEO Jensen Huang says market got it wrong about DeepSeek’s impact

Report: OpenAI plans to shift compute needs from Microsoft to SoftBank

Norway’s 1X is building a humanoid robot for the home

Sakana walks back claims that its AI can dramatically speed up model training

Company

Follow us