AWS’ new service tackles AI hallucinations

Date:

Share post:


Amazon Web Services (AWS), Amazon’s cloud computing division, is launching a new tool to combat hallucinations — that is, scenarios where an AI model behaves unreliably.

Announced at AWS’ re:Invent 2024 conference in Las Vegas, the service, Automated Reasoning checks, validates a model’s responses by cross-referencing customer-supplied info for accuracy. AWS claims in a press release that Automated Reasoning checks is the “first” and “only” safeguard for hallucinations.

But that’s, well… putting it generously.

Automated Reasoning checks is nearly identical to the Correction feature Microsoft rolled out this summer, which also flags AI-generated text that might be factually wrong. Google also offers a tool in Vertex AI, its AI development platform, to let customers “ground” models by using data from third-party providers, their own data sets, or Google Search.

In any case, Automated Reasoning checks, which is available through AWS’ Bedrock model hosting service (specifically the Guardrails tool), attempts to figure out how a model arrived at an answer — and discern whether the answer is correct. Customers upload info to establish a ground truth of sorts, and Automated Reasoning checks and creates rules that can then be refined and applied to a model.

As a model generates responses, Automated Reasoning checks verifies them, and, in the event of a probable hallucination, draws on the ground truth for the right answer. It presents this answer alongside the likely mistruth so customers can see how far off-base the model might’ve been.

AWS says PwC is already using Automated Reasoning checks to design AI assistants for its clients. And Swami Sivasubramanian, VP of AI and data at AWS, suggested that this type of tooling is exactly what’s attracting customers to Bedrock.

“With the launch of these new capabilities,” he said in a statement, “we are innovating on behalf of customers to solve some of the top challenges that the entire industry is facing when moving generative AI applications to production.” Bedrock’s customer base grew by 4.7x in the last year to tens of thousands of customers, Sivasubramanian added.

But as one expert told me this summer, trying to eliminate hallucinations from generative AI is like trying to eliminate hydrogen from water.

AI models hallucinate because they don’t actually “know” anything. They’re statistical systems that identify patterns in a series of data, and predict which data comes next based on previously-seen examples. It follows that a model’s responses aren’t answers, then, but predictions of how questions should be answered — within a margin of error.

AWS claims that Automated Reasoning checks uses “logically accurate” and “verifiable reasoning” to arrive at its conclusions. But the company volunteered no data showing that the tool is itself reliable.

In other Bedrock news, AWS this morning announced Model Distillation, a tool to transfer the capabilities of a large model (e.g. Llama 405B) to a small model (e.g. Llama 8B) that’s cheaper and faster to run. An answer to Microsoft’s Distillation in Azure AI Foundry, Model Distillation provides a way to experiment with various models without breaking the bank, AWS says.

Image Credits:Frederic Lardinois/TechCrunch

“After the customer provides sample prompts, Amazon Bedrock will do all the work to generate responses and fine-tune the smaller model,” AWS explained in a blog post, “and it can even create more sample data, if needed, to complete the distillation process.”

But there’s a few caveats.

Model Distillation only works with Bedrock-hosted models from Anthropic and Meta at present. Customers have to select a large and small model from the same model “family” — the models can’t be from different providers. And distilled models will lose some accuracy — “less than 2%,” AWS claims.

If none of that deters you, Model Distillation is now available in preview, along with Automated Reasoning checks.

Also available in preview is “multi-agent collaboration,” a new Bedrock feature that lets customers assign AI to subtasks in a larger project. A part of Bedrock Agents, AWS’ contribution to the AI agent craze, multi-agent collaboration provides tools to create and tune AI to things like reviewing financial records and assessing global trends.

Customers can even designate a “supervisor agent” to break up and route tasks to the AIs automatically. The supervisor can “[give] specific agents access to the information they need to complete their work,” AWS says, and “[determine] what actions can be processed in parallel and which need details from other tasks before [an] agent can move forward.”

“Once all of the specialized [AIs] complete their inputs, the supervisor agent [can pull] the information together [and] synthesize the results,” AWS wrote in the post.

Sounds nifty. But as with all these features, we’ll have to see how well it works when deployed in the real world.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Spotify users are disappointed by an underwhelming Wrapped this year

After weeks of anticipation, some Spotify users are left underwhelmed by the streamer’s personalized year-in-review feature, Spotify...

Threads users can now follow profiles from other fediverse servers

A new update from Meta’s X competitor Instagram Threads allows users to connect more with the fediverse,...

EU could target ultra low-cost e-tailers like Shein and Temu with package handling fee or import tax

The European Union is drowning under cheap packages coming from Asian online retailers, starting with ultra low-cost...

UnitedHealthcare CEO Brian Thompson shot and killed in New York

Brian Thompson, the CEO of UnitedHealthcare, was fatally shot in Midtown Manhattan early Wednesday morning while walking...

Superhuman launches availability sharing as it thinks about building a calendar app

Email startup Superhuman launched an availability-sharing feature for its built-in calendar, allowing users to share their free...

Revel to install 24 fast EV chargers at JFK airport

Revel is adding to its electric vehicle charging empire in New York City. The startup that began...

Spotify Wrapped 2024 adds an AI podcast powered by Google’s NotebookLM

Spotify Wrapped, the streamer’s highly anticipated annual listening recap, has arrived. In addition to its usual personalized...

Growl is building the Peloton of boxing

There’s a new connected fitness device in town and it’s called Growl. Inspired by hardware companies like...