Runware uses custom hardware and advanced orchestration for fast AI inference

Date:

Share post:


Sometimes, a demo is all you need to understand a product. And that’s the case with Runware. If you head over to Runware’s website, enter a prompt and hit enter to generate an image, you’ll be surprised by how quickly Runware generates the image for you — it takes less than a second.

Runware is a newcomer in the AI inference, or generative AI, startup landscape. The company is building its own servers and optimizing the software layer on those servers to remove bottlenecks and improve inference speeds for image generation models. The startup has already secured $3 million in funding from Andreessen Horowitz’s Speedrun, LakeStar’s Halo II and Lunar Ventures.

The company doesn’t want to reinvent the wheel. It just wants to make it spin faster. Behind the scenes, Runware manufactures its own servers with as many GPUs as possible on the same motherboard. It has its own custom-made cooling system and manages its own data centers.

When it comes to running AI models on its servers, Runware has optimized the orchestration layer with BIOS and operating system optimizations to improve cold start times. It has developed its own algorithms that allocate interference workloads.

The demo is impressive by itself. Now, the company wants to use all this work in research and development and turn it into a business.

Unlike many GPU hosting companies, Runware isn’t going to rent its GPUs based on GPU time. Instead, it believes companies should be encouraged to speed up workloads. That’s why Runware is offering an image generation API with a traditional cost-per-API-call fee structure. It’s based on popular AI models from Flux and Stable Diffusion.

“If you look at Together AI, Replicate, Hugging Face — all of them — they are selling compute based on GPU time,” co-founder and CEO Flaviu Radulescu told TechCrunch. “If you compare the amount of time it takes for us to make an image versus them. And then you compare the pricing, you will see that we are so much cheaper, so much faster.”

“It’s going to be impossible for them to match this performance,” he added. “Especially in a cloud provider, you have to run on a virtualized environment, which adds additional delays.”

As Runware is looking at the entire inference pipeline, and optimizing hardware and software, the company hopes that it will be able to use GPUs from multiple vendors in the near future. This has been an important endeavor for several startups as Nvidia is the clear leader in the GPU space, which means that Nvidia GPUs tend to be quite expensive.

“Right now, we use just Nvidia GPUs. But this should be an abstraction of the software layer,” Radulescu said. “We can switch a model from GPU memory in and out very, very fast, which allow us to put multiple customers on the same GPUs.

“So we are not like our competitors. They just load a model into the GPU and then the GPU does a very specific type of task. In our case, we’ve developed this software solution, which allow us to switch a model in the GPU memory as we do inference.“

If AMD and other GPU vendors can create compatibility layers that work with typical AI workloads, Runware is well positioned to build a hybrid cloud that would rely on GPUs from multiple vendors. And that will certainly help if it wants to remain cheaper than competitors at AI inference.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

OpenAI closes the largest VC round of all time

Welcome back to Week in Review. This week, we’re diving into OpenAI’s $6.6 billion fundraising round, the...

What’s in the rug? How TikTok got swept into a real-time true crime story

A woman in Ohio is being haunted by ghosts. Or maybe she’s not. There’s a dead body...

Fisker’s HQ abandoned in “complete disarray” with apparent hazardous waste, clay models left behind

The headquarters Fisker used in its waning days was recently abandoned and left in “complete disarray,” with...

SoCreate wants to transform screenwriting software with AI imagery and community sharing tools

Many screenwriters have embraced modern tools over traditional PDFs to craft their film or TV show pilots....

5 ‘dumbphones’ that can still run WhatsApp

Smartphones have long been the dominant device for communicating on the move, outselling their pared-down feature phone...

The ‘Mozart of Math’ isn’t worried about AI replacing math nerds — ever

Terence Tao, a UCLA professor considered to be the “world’s greatest living mathematician,” last month compared ChapGPT’s...

YouTube apologizes for falsely banning channels for spam, canceling subscriptions

A misfire of YouTube’s systems led to the accidental banning of YouTube channels affecting numerous creators who...

OpenAI secured more billions, but there’s still capital left for other startups

Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of...