AWS brings prompt routing and caching to its Bedrock LLM service

Date:

Share post:


As businesses move from trying out generative AI in limited prototypes to putting them into production, they are becoming increasingly price conscious. Using large language models (LLMs) isn’t cheap, after all. One way to reduce cost is to go back to an old concept: caching. Another is to route simpler queries to smaller, more cost-efficient models. At its re:Invent conference in Las Vegas, AWS on Wednesday announced both of these features for its Bedrock LLM hosting service.

Let’s talk about the caching service first. “Say there is a document, and multiple people are asking questions on the same document. Every single time you’re paying,” Atul Deo, the director of product for Bedrock, told me. “And these context windows are getting longer and longer. For example, with Nova, we’re going to have 300k [tokens of] context and 2 million [tokens of] context. I think by next year, it could even go much higher.”

Image Credits:AWS

Caching essentially ensures that you don’t have to pay for the model to do repetitive work and reprocess the same (or substantially similar) queries over and over again. According to AWS, this can reduce cost by up to 90% but one additional by-product of this is also that the latency for getting an answer back from the model is significantly lower (AWS says by up to 85%). Adobe, which tested prompt caching for some of its generative AI applications on Bedrock, saw a 72% reduction in response time.

The other major new feature is intelligent prompt routing for Bedrock. With this, Bedrock can automatically route prompts to different models in the same model family to help businesses strike the right balance between performance and cost. The system automatically predicts (using a small language model) how each model will perform for a given query and then route the request accordingly.

Screenshot 2024 12 04 at 9.23.17AM
Image Credits:AWS

“Sometimes, my query could be very simple. Do I really need to send that query to the most capable model, which is extremely expensive and slow? Probably not. So basically, you want to create this notion of ‘Hey, at run time, based on the incoming prompt, send the right query to the right model,’” Deo explained.

LLM routing isn’t a new concept, of course. Startups like Martian and a number of open source projects also tackle this, but AWS would likely argue that what differentiates its offering is that the router can intelligently direct queries without a lot of human input. But it’s also limited, in that it can only route queries to models in the same model family. In the long run, though, Deo told me, the team plans to expand this system and give users more customizability.

Screenshot 2024 12 04 at 9.16.34AM
Image Credits:AWS

Lastly, AWS is also launching a new marketplace for Bedrock. The idea here, Deo said, is that while Amazon is partnering with many of the larger model providers, there are now hundreds of specialized models that may only have a few dedicated users. Since those customers are asking the company to support these, AWS is launching a marketplace for these models, where the only major difference is that users will have to provision and manage the capacity of their infrastructure themselves — something that Bedrock typically handles automatically. In total, AWS will offer about 100 of these emerging and specialized models, with more to come.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Embedded data analytics startup Embeddable is still handpicking its customers despite strong demand

Tom Gardiner and Harry Marshall founded Trevor.io in 2016 as a no-code business intelligence platform to help...

Anybotics raises $60M to bring more autonomous industrial robots to the U.S.

Swiss robotics company Anybotics has raised an extra $60 million to close its Series B round off...

Upvest, a stock trading API used by N26, Revolut and others, raises $105 million

Upvest might not be a familiar name if you don’t pay close attention to the fintech industry,...

Laam lands $5.5M to provide South Asian fashions to migrants around the world

Demand for South Asian fashion is growing globally as more South Asians are migrating and settling in...

ChatGPT and Sora are down

OpenAI says ChatGPT, Sora, and its developer-facing API are experiencing a major outage, according to the company’s...

SolarSquare raises $40 million in India’s largest solar venture round

SolarSquare has raised $40 million in what is the largest venture round in India’s solar sector. The...

Microsoft will take an $800M hit over Cruise robotaxi shutdown

GM’s decision to shut down its Cruise robotaxi program continues to ripple through the market, extending to...

Trump’s proposed university endowment tax could hurt funding, VC warns

Some VCs are looking at the Trump administration’s proposed massive tax increase on university endowments with alarm,...