Cloudflare launches a tool to combat AI bots

Cloudflare, the publicly traded cloud service provider, has launched a new, free tool to prevent bots from scraping websites hosted on its platform for data to train AI models.

Some AI vendors, including Google, OpenAI and Apple, allow website owners to block the bots they use for data scraping and model training by amending their site’s robots.txt, the text file that tells bots which pages they can access on a website. But, as Cloudflare points out in a post announcing its bot-combating tool, not all AI scrapers respect this.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

So, in an attempt to address the problem, Cloudflare analyzed AI bot and crawler traffic to fine-tune automatic bot detection models. The models consider, among other factors, whether an AI bot might be trying to evade detection by mimicking the appearance and behavior of someone using a web browser.

“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”

Cloudflare has set up a form for hosts to report suspected AI bots and crawlers and says that it’ll continue to manually blacklist AI bots over time.

The problem of AI bots has come into sharp relief as the generative AI boom fuels the demand for model training data.

Many sites, wary of AI vendors training models on their content without alerting or compensating them, have opted to block AI scrapers and crawlers. Around 26% of the top 1,000 sites on the web have blocked OpenAI’s bot, according to one study; another found that more than 600 news publishers had blocked the bot.

Blocking isn’t a surefire protection, however. As alluded to earlier, some vendors appear to be ignoring standard bot exclusion rules to gain a competitive advantage in the AI race. AI search engine Perplexity was recently accused of impersonating legitimate visitors to scrape content from websites, and OpenAI and Anthropic are said to have at times ignored robots.txt rules.

In a letter to publishers last month, content licensing startup TollBit said that, in fact, it sees “many AI agents” ignoring the robots.txt standard.

Tools like Cloudflare’s could help — but only if they prove to be accurate in detecting clandestine AI bots. And they won’t solve the more intractable problem of publishers risking sacrificing referral traffic from AI tools like Google’s AI Overviews, which exclude sites from inclusion if they block specific AI crawlers.

Source link

Cloudflare launches a tool to combat AI bots

Recent posts

How do you solve a problem like MariaDB? Cozy up to the community, says new CEO

Greenlite, founded by an ex-Gopuff exec, automates construction permitting

Threads is not working on its own DM system yet, but it might make it easier to send Instagram messages

Indian social network Koo is shutting down as buyout talks collapse

‘Disappointed but not surprised’: former employees speak on OpenAI’s opposition to SB 1047

Science-heavy Swiss VC firm Redalpine raises fresh $200M fund for early-stage investments

Humane execs leave company to found AI fact-checking startup

UK satellite startup Blue Skies Space wants to sell astronomy data ‘as a service’

TechCrunch Minute: Investors pour money into non-alcoholic beverages

Amazon revamps Ring subscriptions with AI video search

GovWell is bringing automation and efficiency to local governments

Bluesky tops app charts and sees ‘all-time-highs’ after Brazil bans X

TechCrunch Minute: Beware the smiling robot with living skin

Lyft partners with May Mobility, Mobileye to bring autonomous vehicles to the app

Instagram is embracing the ‘photo dump’

Related articles

Executive assistants, high salaries, and other ways early-stage founders will trigger a seed VC

These alternatives to popular apps can help reclaim your online life from billionaires and surveillance

The curious case of Nebius, the publicly traded AI infrastructure ‘startup’

How a digital “you” can sit through your agonizing web conference calls

‘Wolfs’ sequel canceled because director ‘no longer trusted’ Apple

DOJ tells Google to sell Chrome

Tesla says it has reached a ‘conditional’ settlement in Rivian trade secrets lawsuit

The rise and fall of the ‘Scattered Spider’ hackers

Company

Follow us