Google releases tech to watermark AI-generated text

Date:

Share post:


Google is making SynthID Text, its technology that lets developers watermark and detect text generated by generative AI models, generally available.

SynthID Text can be downloaded from the AI platform Hugging Face and Google’s updated Responsible GenAI Toolkit.

“Today, we’re open sourcing our SynthID Text watermarking tool,” the company wrote in a post on X. “Available freely to developers and businesses, it will help them identify their AI-generated content.”

So how does it work?

Given a prompt like “What’s your favorite fruit?,” text-generating models predict which “token” most likely follows another — one token at a time. Tokens are the building blocks a generative model uses to process information. They can be a single character, word, or part of a phrase.

The model assigns each possible token a score, which is the percentage chance it’s included in outputted text. SynthID Text inserts additional data in this token distribution by “modulating the likelihood of tokens being generated,” Google says.

“The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark,” the company wrote in a blog post. “This pattern of scores is compared with the expected pattern of scores for watermarked and unwatermarked text, helping SynthID detect if an AI tool generated the text or if it might come from other sources.”

Google claims that SynthID Text, which has been integrated with its Gemini models since this spring, doesn’t compromise the quality, accuracy, or speed of text generation, and works even on text that’s been cropped, paraphrased, or modified.

But the company also admits that its watermarking technology has limitations.

For example, SynthID Text doesn’t perform as well with short text or text that’s been rewritten or translated from another language, and with responses to factual questions. “On responses to factual prompts, there are fewer opportunities to adjust the token distribution without affecting the factual accuracy,” explains the company. “This includes prompts like ‘What is the capital of France?’ or queries where little or no variation is expected like ‘recite a William Wordsworth poem.’”



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Beleaguered startup Humane drops AI Pin price by $200

Humane on Wednesday announced a $200 price cut to its flagship product, the AI Pin. The Bay...

Philly’s Portal is like social media come to life

For once, the biggest tourist attraction in Philadelphia’s Love Park isn’t the iconic artwork that gives the...

Arm reportedly cancels Qualcomm design license

In a move that could profoundly affect an already beleaguered global supply chain, Arm has reportedly issued...

Longtime policy researcher Miles Brundage leaves OpenAI

Miles Brundage, a longtime policy researcher at OpenAI and senior advisor to the company’s AGI Readiness team,...

DuckDuckGo will be making more early-stage investments in privacy-focused startups

DuckDuckGo, the privacy-focused search alternative to Google, wants to put money into startups that have a similar...

5 days to go: TechCrunch Disrupt 2024 kicks off and ticket prices go up

In only 5 short days, Moscone West in San Francisco will come alive with 10,000 startup and...

Lawsuit blames Character.AI in death of 14-year-old boy

Character.AI has been sued after the suicide of a 14-year-old Florida boy whose mother says he became...

Contactles stores to grow in Europe as Sensei reels in another $16M

While Amazon Go helped pioneer the concept of the totally contactless store, others, like 7-Eleven and Walmart,...