Google is using Anthropic’s Claude to improve its Gemini AI

Contractors working to improve Google’s Gemini AI are comparing its answers against outputs produced by Anthropic’s competitor model Claude, according to internal correspondence seen by TechCrunch.

Google would not say, when reached by TechCrunch for comment, if it had obtained permission for its use of Claude in testing against Gemini.

As tech companies race to build better AI models, the performance of these models are often evaluated against competitors, typically by running their own models through industry benchmarks rather than having contractors painstakingly evaluate their competitors’ AI responses.

The contractors working on Gemini tasked with rating the accuracy of the model’s outputs must score each response that they see according to multiple criteria, like truthfulness and verbosity. The contractors are given up to 30 minutes per prompt to determine whose answer is better, Gemini’s or Claude’s, according to the correspondence seen by TechCrunch.

The contractors recently began noticing references to Anthropic’s Claude appearing in the internal Google platform they use to compare Gemini to other unnamed AI models, the correspondence showed. At least one of the outputs presented to Gemini contractors, seen by TechCrunch, explicitly stated: “I am Claude, created by Anthropic.”

One internal chat showed the contractors noticing Claude’s responses appearing to emphasize safety more than Gemini. “Claude’s safety settings are the strictest” among AI models, one contractor wrote. In certain cases, Claude wouldn’t respond to prompts that it considered unsafe, such as role-playing a different AI assistant. In another, Claude avoided answering a prompt, while Gemini’s response was flagged as a “huge safety violation” for including “nudity and bondage.”

Anthropic’s commercial terms of service forbid customers from accessing Claude “to build a competing product or service” or “train competing AI models” without approval from Anthropic. Google is a major investor in Anthropic.

Shira McNamara, a spokesperson for Google DeepMind, which runs Gemini, would not say — when asked by TechCrunch — whether Google has obtained Anthropic’s approval to access Claude. When reached prior to publication, an Anthropic spokesperson did not comment by press time.

McNamara said that DeepMind does “compare model outputs” for evaluations but that it doesn’t train Gemini on Anthropic models.

“Of course, in line with standard industry practice, in some cases we compare model outputs as part of our evaluation process,” McNamara said. “However, any suggestion that we have used Anthropic models to train Gemini is inaccurate.”

Last week, TechCrunch exclusively reported that Google contractors working on the company’s AI products are now being made to rate Gemini’s AI responses in areas outside of their expertise. Internal correspondence expressed concerns by contractors that Gemini could generate inaccurate information on highly sensitive topics like healthcare.

You can send tips securely to this reporter on Signal at +1 628-282-2811.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Source link

Google is using Anthropic’s Claude to improve its Gemini AI

Recent posts

From their experiences at Uber and PayPal, Palm founders want to make moving cash easier for big companies

OpenAI pledges that its models won’t censor viewpoints

Built in four days, this $120 robot arm cleans a spill with help from GPT-4o

Palantir and Anduril reportedly building a tech consortium to bid on defense contracts

VW taps former Rivian exec to run US business

Another robotaxi operation heads to Texas and Archer scores $300M for its defense mission

Tesla to test virtual queues at Supercharging locations

OroraTech’s space-based wildfire detection brings in $25M to put more imaging satellites in orbit

Report: OpenAI plans to shift compute needs from Microsoft to SoftBank

UK government demands Apple backdoor to encrypted cloud data: report

SGNL snags $30M for a new take on ID security based on zero-standing privileges

TikTok’s e-commerce ‘Shop’ platform launches in Spain

DeepSeek’s new AI model appears to be one of the best ‘open’ challengers yet

Bluesky adds a ‘followers only’ reply option

Instagram locks out developers of third-party consumer apps

Related articles

Grok 3 appears to be driving Grok usage to new heights

Perplexity teases a web browser called Comet

The startup battle begins now: TechCrunch Startup Battlefield 200 applications are open

Flexport releases onslaught of AI tools in a move inspired by ‘Founder Mode’

A single default password exposes access to dozens of apartment buildings

Meta AI arrives in the Middle East and Africa with support for Arabic

Apple commits $500B to US manufacturing, including a new AI server facility in Houston

Yope is sparking GenZ (and VC) interest with an Instagram-like app for private groups

Company

Follow us