AWS’ new service tackles AI hallucinations

Amazon Web Services (AWS), Amazon’s cloud computing division, is launching a new tool to combat hallucinations — that is, scenarios where an AI model behaves unreliably.

Announced at AWS’ re:Invent 2024 conference in Las Vegas, the service, Automated Reasoning checks, validates a model’s responses by cross-referencing customer-supplied info for accuracy. AWS claims in a press release that Automated Reasoning checks is the “first” and “only” safeguard for hallucinations.

But that’s, well… putting it generously.

Automated Reasoning checks is nearly identical to the Correction feature Microsoft rolled out this summer, which also flags AI-generated text that might be factually wrong. Google also offers a tool in Vertex AI, its AI development platform, to let customers “ground” models by using data from third-party providers, their own data sets, or Google Search.

In any case, Automated Reasoning checks, which is available through AWS’ Bedrock model hosting service (specifically the Guardrails tool), attempts to figure out how a model arrived at an answer — and discern whether the answer is correct. Customers upload info to establish a ground truth of sorts, and Automated Reasoning checks and creates rules that can then be refined and applied to a model.

As a model generates responses, Automated Reasoning checks verifies them, and, in the event of a probable hallucination, draws on the ground truth for the right answer. It presents this answer alongside the likely mistruth so customers can see how far off-base the model might’ve been.

AWS says PwC is already using Automated Reasoning checks to design AI assistants for its clients. And Swami Sivasubramanian, VP of AI and data at AWS, suggested that this type of tooling is exactly what’s attracting customers to Bedrock.

“With the launch of these new capabilities,” he said in a statement, “we are innovating on behalf of customers to solve some of the top challenges that the entire industry is facing when moving generative AI applications to production.” Bedrock’s customer base grew by 4.7x in the last year to tens of thousands of customers, Sivasubramanian added.

But as one expert told me this summer, trying to eliminate hallucinations from generative AI is like trying to eliminate hydrogen from water.

AI models hallucinate because they don’t actually “know” anything. They’re statistical systems that identify patterns in a series of data, and predict which data comes next based on previously-seen examples. It follows that a model’s responses aren’t answers, then, but predictions of how questions should be answered — within a margin of error.

AWS claims that Automated Reasoning checks uses “logically accurate” and “verifiable reasoning” to arrive at its conclusions. But the company volunteered no data showing that the tool is itself reliable.

In other Bedrock news, AWS this morning announced Model Distillation, a tool to transfer the capabilities of a large model (e.g. Llama 405B) to a small model (e.g. Llama 8B) that’s cheaper and faster to run. An answer to Microsoft’s Distillation in Azure AI Foundry, Model Distillation provides a way to experiment with various models without breaking the bank, AWS says.

Image Credits:Frederic Lardinois/TechCrunch

“After the customer provides sample prompts, Amazon Bedrock will do all the work to generate responses and fine-tune the smaller model,” AWS explained in a blog post, “and it can even create more sample data, if needed, to complete the distillation process.”

But there’s a few caveats.

Model Distillation only works with Bedrock-hosted models from Anthropic and Meta at present. Customers have to select a large and small model from the same model “family” — the models can’t be from different providers. And distilled models will lose some accuracy — “less than 2%,” AWS claims.

If none of that deters you, Model Distillation is now available in preview, along with Automated Reasoning checks.

Also available in preview is “multi-agent collaboration,” a new Bedrock feature that lets customers assign AI to subtasks in a larger project. A part of Bedrock Agents, AWS’ contribution to the AI agent craze, multi-agent collaboration provides tools to create and tune AI to things like reviewing financial records and assessing global trends.

Customers can even designate a “supervisor agent” to break up and route tasks to the AIs automatically. The supervisor can “[give] specific agents access to the information they need to complete their work,” AWS says, and “[determine] what actions can be processed in parallel and which need details from other tasks before [an] agent can move forward.”

“Once all of the specialized [AIs] complete their inputs, the supervisor agent [can pull] the information together [and] synthesize the results,” AWS wrote in the post.

Sounds nifty. But as with all these features, we’ll have to see how well it works when deployed in the real world.

Source link

AWS’ new service tackles AI hallucinations

Recent posts

Delivery Hero warns it could face €400M antitrust fine

Longtime policy researcher Miles Brundage leaves OpenAI

Announcing more judges for Startup Battlefield 200 at TechCrunch Disrupt 2024

Napkin is a note-taking app that is not about making you more productive

The next fintech to go public may not be the one you expected

WeRide preps for an IPO, meet the man who built a startup pipeline at CNH and Waymo’s nightly honk-a-thon

The SEC just made life a little easier for smaller VCs

India backs Musk in satellite spectrum allocation row

US and UK announce joint children’s online safety group to push for common fixes

TechCrunch Disrupt 2024 Side Events schedule: Mercury, Jetro, Enterprise Ireland, and more to host

Early Thanksgiving online sales numbers are up 7% YoY to $15.6B, on par with pre-pandemic trends

Prosus expects 5 IPOs from its India portfolio in next 18 months

Smartwatches shipments see sharp decline in India

How to harness generative AI in music and video production without displacing artists

Architecting the Future: Building Hardware for an AI-Native World

Related articles

Spotify users are disappointed by an underwhelming Wrapped this year

Threads users can now follow profiles from other fediverse servers

EU could target ultra low-cost e-tailers like Shein and Temu with package handling fee or import tax

UnitedHealthcare CEO Brian Thompson shot and killed in New York

Superhuman launches availability sharing as it thinks about building a calendar app

Revel to install 24 fast EV chargers at JFK airport

Spotify Wrapped 2024 adds an AI podcast powered by Google’s NotebookLM

Growl is building the Peloton of boxing

Company

Follow us