AWS’ new service tackles AI hallucinations

Amazon Web Services (AWS), Amazon’s cloud computing division, is launching a new tool to combat hallucinations — that is, scenarios where an AI model behaves unreliably.

Announced at AWS’ re:Invent 2024 conference in Las Vegas, the service, Automated Reasoning checks, validates a model’s responses by cross-referencing customer-supplied info for accuracy. AWS claims in a press release that Automated Reasoning checks is the “first” and “only” safeguard for hallucinations.

But that’s, well… putting it generously.

Automated Reasoning checks is nearly identical to the Correction feature Microsoft rolled out this summer, which also flags AI-generated text that might be factually wrong. Google also offers a tool in Vertex AI, its AI development platform, to let customers “ground” models by using data from third-party providers, their own data sets, or Google Search.

In any case, Automated Reasoning checks, which is available through AWS’ Bedrock model hosting service (specifically the Guardrails tool), attempts to figure out how a model arrived at an answer — and discern whether the answer is correct. Customers upload info to establish a ground truth of sorts, and Automated Reasoning checks and creates rules that can then be refined and applied to a model.

As a model generates responses, Automated Reasoning checks verifies them, and, in the event of a probable hallucination, draws on the ground truth for the right answer. It presents this answer alongside the likely mistruth so customers can see how far off-base the model might’ve been.

AWS says PwC is already using Automated Reasoning checks to design AI assistants for its clients. And Swami Sivasubramanian, VP of AI and data at AWS, suggested that this type of tooling is exactly what’s attracting customers to Bedrock.

“With the launch of these new capabilities,” he said in a statement, “we are innovating on behalf of customers to solve some of the top challenges that the entire industry is facing when moving generative AI applications to production.” Bedrock’s customer base grew by 4.7x in the last year to tens of thousands of customers, Sivasubramanian added.

But as one expert told me this summer, trying to eliminate hallucinations from generative AI is like trying to eliminate hydrogen from water.

AI models hallucinate because they don’t actually “know” anything. They’re statistical systems that identify patterns in a series of data, and predict which data comes next based on previously-seen examples. It follows that a model’s responses aren’t answers, then, but predictions of how questions should be answered — within a margin of error.

AWS claims that Automated Reasoning checks uses “logically accurate” and “verifiable reasoning” to arrive at its conclusions. But the company volunteered no data showing that the tool is itself reliable.

In other Bedrock news, AWS this morning announced Model Distillation, a tool to transfer the capabilities of a large model (e.g. Llama 405B) to a small model (e.g. Llama 8B) that’s cheaper and faster to run. An answer to Microsoft’s Distillation in Azure AI Foundry, Model Distillation provides a way to experiment with various models without breaking the bank, AWS says.

Image Credits:Frederic Lardinois/TechCrunch

“After the customer provides sample prompts, Amazon Bedrock will do all the work to generate responses and fine-tune the smaller model,” AWS explained in a blog post, “and it can even create more sample data, if needed, to complete the distillation process.”

But there’s a few caveats.

Model Distillation only works with Bedrock-hosted models from Anthropic and Meta at present. Customers have to select a large and small model from the same model “family” — the models can’t be from different providers. And distilled models will lose some accuracy — “less than 2%,” AWS claims.

If none of that deters you, Model Distillation is now available in preview, along with Automated Reasoning checks.

Also available in preview is “multi-agent collaboration,” a new Bedrock feature that lets customers assign AI to subtasks in a larger project. A part of Bedrock Agents, AWS’ contribution to the AI agent craze, multi-agent collaboration provides tools to create and tune AI to things like reviewing financial records and assessing global trends.

Customers can even designate a “supervisor agent” to break up and route tasks to the AIs automatically. The supervisor can “[give] specific agents access to the information they need to complete their work,” AWS says, and “[determine] what actions can be processed in parallel and which need details from other tasks before [an] agent can move forward.”

“Once all of the specialized [AIs] complete their inputs, the supervisor agent [can pull] the information together [and] synthesize the results,” AWS wrote in the post.

Sounds nifty. But as with all these features, we’ll have to see how well it works when deployed in the real world.

Source link

AWS’ new service tackles AI hallucinations

Recent posts

Microsoft could end up with substantial equity in the restructured, for-profit OpenAI

Apple, Google wallets now support California driver’s licenses

Fly Ventures sets its eyes on technical founders with a fresh €80M fund

TikTok ban poised to be delayed as Trump explores ways to extend deadline

Google ships first developer preview Android 16 to speed up feature roll outs

SignalFire, CapitalG, and Comprehensive.io coming to TechCrunch Disrupt 2024

Nvidia’s next move: powering humanoid robots

TikTok sunsets its creator marketplace for TikTok One, a broader solution with AI tools

Judge allows authors’ AI copyright lawsuit against Meta to move forward

ServiceTitan’s IPO is a big winner that could inspire fintechs

Telegram CEO Durov’s arrest hasn’t dampened enthusiasm for its TON blockchain

Perfect taps $23M to fix the flaws in recruitment with AI

After delivering astronauts to ISS, SpaceX’s Falcon 9 grounded after third anomaly in three months

Adam Neumann’s crypto comeback company is reportedly refunding investors

Biden admin’s final rule banning Chinese connected cars also bars robotaxi testing on US roads

Related articles

Neom is reportedly turning into a financial disaster, except for McKinsey & Co.

Manus probably isn’t China’s second ‘DeepSeek moment’

Japan’s service robot market projected to triple in five years

Colossal CEO Ben Lamm says humanity has a ‘moral obligation’ to pursue de-extinction tech

Tammy Nam joins AI-powered ad startup Creatopy as CEO

Apple’s smart home hub reportedly delayed by Siri challenges

Musk may still have a chance to thwart OpenAI’s for-profit conversion

How to stop doomscrolling

Company

Follow us