Reddit’s upcoming changes attempt to safeguard the platform against AI crawlers

Date:

Share post:


Reddit announced on Tuesday that it’s updating its Robots Exclusion Protocol (robots.txt file), which tells automated web bots whether they are permitted to crawl a site.

Historically, robots.txt file was used to allow search engines to scape a site and then direct people to the content. However, with the rise of AI, websites are being scraped and used to train models without acknowledging the actual source of the content.

Along with the updated robots.txt file, Reddit will continue rate-limiting and blocking unknown bots and crawlers from accessing its platform. The company told TechCrunch that bots and crawlers will be rate-limited or blocked if they don’t abide by Reddit’s Public Content Policy and don’t have an agreement with the platform.

Reddit says the update shouldn’t affect the majority of users or good faith actors, like researchers and organizations, such as the Internet Archive. Instead, the update is designed to deter AI companies from training their large language models on Reddit content. Of course, AI crawlers could ignore Reddit’s robots.txt file.

The announcement comes a few days after a Wired investigation found that AI-powered search startup Perplexity has been stealing and scraping content. Wired found that Perplexity seems to ignore requests not to scrape its website, even though it blocked the startup in its robots.txt file. Perplexity CEO Aravind Srinivas responded to the claims and said that the robots.txt file is not a legal framework.

Reddit’s upcoming changes won’t affect companies that it has an agreement with. For instance, Reddit has a $60 million deal with Google that allows the search giant to train its AI models on the social platform’s content. With these changes, Reddit is signaling to other companies that want to use Reddit’s data for AI training that they will have to pay.

“Anyone accessing Reddit content must abide by our policies, including those in place to protect redditors,” Reddit said in a blog post. “We are selective about who we work with and trust with large-scale access to Reddit content.”

The announcement doesn’t come as a surprise, as Reddit released a new policy a few weeks ago that was designed to guide how Reddit’s data is being accessed and used by commercial entities and other partners.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Battery unicorn Northvolt files for bankruptcy, upending Europe’s industrial plan

Beleaguered Swedish battery manufacturer Northvolt announced today that it was filing for bankruptcy in the U.S., striking...

Brave Search adds AI chat for follow-up questions after your initial query

Brave announced on Thursday that it’s introducing an AI chat mode for follow-up questions based on initial...

Cruise fesses up, Pony AI raises its IPO ambitions, and the TuSimple drama dials back up

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of...

WhatsApp rolls out voice message transcripts

WhatsApp announced on Thursday it’s rolling out voice message transcripts. The Meta-owned company says the new feature...

Threads adjusts its algorithm to show you more content from accounts you follow

After several complaints about its algorithm, Threads is finally making changes to surface more content from people...

Spotify tests a video feature for audiobooks as it ramps up video expansion

Spotify is enhancing the audiobook experience for premium users through three new experiments: video clips, author pages,...

Candela brings its P-12 electric ferry to Tahoe and adds another $14M to build more

Electric passenger boat startup Candela has topped off its most recent raise with another $14 million, the...

OneRail’s software helps solve the last-mile delivery problem

Last-mile delivery, the very last step of the delivery process, is a common pain point for companies....