AWS brings prompt routing and caching to its Bedrock LLM service

As businesses move from trying out generative AI in limited prototypes to putting them into production, they are becoming increasingly price conscious. Using large language models (LLMs) isn’t cheap, after all. One way to reduce cost is to go back to an old concept: caching. Another is to route simpler queries to smaller, more cost-efficient models. At its re:Invent conference in Las Vegas, AWS on Wednesday announced both of these features for its Bedrock LLM hosting service.

Let’s talk about the caching service first. “Say there is a document, and multiple people are asking questions on the same document. Every single time you’re paying,” Atul Deo, the director of product for Bedrock, told me. “And these context windows are getting longer and longer. For example, with Nova, we’re going to have 300k [tokens of] context and 2 million [tokens of] context. I think by next year, it could even go much higher.”

Image Credits:AWS

Caching essentially ensures that you don’t have to pay for the model to do repetitive work and reprocess the same (or substantially similar) queries over and over again. According to AWS, this can reduce cost by up to 90% but one additional by-product of this is also that the latency for getting an answer back from the model is significantly lower (AWS says by up to 85%). Adobe, which tested prompt caching for some of its generative AI applications on Bedrock, saw a 72% reduction in response time.

The other major new feature is intelligent prompt routing for Bedrock. With this, Bedrock can automatically route prompts to different models in the same model family to help businesses strike the right balance between performance and cost. The system automatically predicts (using a small language model) how each model will perform for a given query and then route the request accordingly.

Screenshot 2024 12 04 at 9.23.17AM — **Image Credits:**AWS

“Sometimes, my query could be very simple. Do I really need to send that query to the most capable model, which is extremely expensive and slow? Probably not. So basically, you want to create this notion of ‘Hey, at run time, based on the incoming prompt, send the right query to the right model,’” Deo explained.

LLM routing isn’t a new concept, of course. Startups like Martian and a number of open source projects also tackle this, but AWS would likely argue that what differentiates its offering is that the router can intelligently direct queries without a lot of human input. But it’s also limited, in that it can only route queries to models in the same model family. In the long run, though, Deo told me, the team plans to expand this system and give users more customizability.

Screenshot 2024 12 04 at 9.16.34AM — **Image Credits:**AWS

Lastly, AWS is also launching a new marketplace for Bedrock. The idea here, Deo said, is that while Amazon is partnering with many of the larger model providers, there are now hundreds of specialized models that may only have a few dedicated users. Since those customers are asking the company to support these, AWS is launching a marketplace for these models, where the only major difference is that users will have to provision and manage the capacity of their infrastructure themselves — something that Bedrock typically handles automatically. In total, AWS will offer about 100 of these emerging and specialized models, with more to come.

Source link

AWS brings prompt routing and caching to its Bedrock LLM service

Recent posts

UK tribunal green-lights $2.7B Facebook collective action antitrust lawsuit

How to ask Google to remove deepfake porn results from Google Search

Announcing the final agenda for the AI Stage at TechCrunch Disrupt 2024

Spotify users are disappointed by an underwhelming Wrapped this year

Amazon hires the founders of AI robotics startup Covariant

Laam lands $5.5M to provide South Asian fashions to migrants around the world

Palantir CEO Alex Karp is ‘not going to apologize’ for military work

TikTok accounts are becoming divorce fodder

Paylocity is acquiring corporate spend startup Airbase for $325M

Flint Capital raises a $160M through an unusual fund-raising strategy

TechCrunch Minute: This robotic wheelchair can climb stairs

Unbabel among first AI startups to win millions of GPU training hours on EU supercomputers

Should venture capitalists be held accountable when startups screw up?

Proton releases a self-custody bitcoin wallet

Trade My Spin is building a business around used Peloton equipment

Related articles

Embedded data analytics startup Embeddable is still handpicking its customers despite strong demand

Anybotics raises $60M to bring more autonomous industrial robots to the U.S.

Upvest, a stock trading API used by N26, Revolut and others, raises $105 million

Laam lands $5.5M to provide South Asian fashions to migrants around the world

ChatGPT and Sora are down

SolarSquare raises $40 million in India’s largest solar venture round

Microsoft will take an $800M hit over Cruise robotaxi shutdown

Trump’s proposed university endowment tax could hurt funding, VC warns

Company

Follow us