AWS bets on liquid cooling for its AI servers

Date:

Share post:


It’s AWS re:Invent 2024 this week, Amazon’s annual cloud computing extravaganza in Las Vegas, and as is tradition, the company has so much to announce, it can’t fit everything into its five (!) keynotes. Ahead of the show’s official opening, AWS on Monday detailed a number of updates to its overall data center strategy that are worth paying attention to.

The most important of these is that AWS will soon start using liquid cooling for its AI servers and other machines, regardless of whether those are based on its homegrown Trainium chips and Nvidia’s accelerators. Specifically, AWS notes that its Trainium2 chips (which are still in preview) and “rack-scale AI supercomputing solutions like NVIDIA GB200 NVL72” will be cooled this way.

It’s worth highlighting that AWS stresses that these updated cooling systems can integrate both air and liquid cooling. After all, there are still plenty of other servers in the data centers that handle networking and storage, for example, that don’t require liquid cooling. “This flexible, multimodal cooling design allows AWS to provide maximum performance and efficiency at the lowest cost, whether running traditional workloads or AI models,” AWS explains.

The company also announced that it is moving to more simplified electrical and mechanical designes for its servers and server racks.

“AWS’s latest data center design improvements include simplified electrical distribution and mechanical systems, which enable infrastructure availability of 99.9999%. The simplified systems also reduce the potential number of racks that can be impacted by electrical issues by 89%,” the company notes in its announcement. In part, AWS is doing this by reducing the number of times the electricity gets converted on its way from the electrical network to the server.

AWS didn’t provide many more details than that, but this likely means using DC power to run the servers and/or HVAC system and avoiding many of the AC-DC-AC conversion steps (with their default losses) otherwise necessary.

“AWS continues to relentlessly innovate its infrastructure to build the most performant, resilient, secure, and sustainable cloud for customers worldwide,” said Prasad Kalyanaraman, vice president of Infrastructure Services at AWS, in Monday’s announcement. “These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads. But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”

In total, AWS says, the new multimodal cooling system and upgraded power delivery system will let the organization “support a 6x increase in rack power density over the next two years, and another 3x increase in the future.”

In this context, AWS also notes that it is now using AI to predict the most efficient way to position racks in the data center to reduce the amount of unused or underutilized power. AWS will also roll out its own control system across its electrical and mechanical devices in the data center, which will come with built-in telemetry services for real-time diagnostics and troubleshooting.

“Data centers must evolve to meet AI’s transformative demands,” said Ian Buck, vice president of hyperscale and HPC at Nvidia. “By enabling advanced liquid cooling solutions, AI infrastructure can be efficiently cooled while minimizing energy use. Our work with AWS on their liquid cooling rack design will allow customers to run demanding AI workloads with exceptional performance and efficiency.”



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Cleerly raises $106M from Insight Partners for AI heart health early detection

Although heart disease is the leading cause of death in the United States, a significant portion of...

A billionaire private astronaut and SpaceX supporter may be the next NASA head

Incoming President Donald Trump has nominated Jared Isaacman, a billionaire entrepreneur and private astronaut, to lead NASA...

OpenAI inks deal to upgrade Anduril’s anti-drone tech

OpenAI plans to team up with Anduril, the defense startup, to supply its AI tech to systems...

Non-profit 1863 Ventures has closed, reborn into for-profit New Majority Ventures

The nonprofit organization 1863 Ventures, which focused on providing capital and mentorship to early-staged underrepresented founders, will...

LLMs may have a killer enterprise app: ‘digital labor’ — at least if Salesforce Agentforce is any indicator

If Don Draper from “Mad Men” was quintessentially, at his deepest self, an ad man, then Salesforce...

The pope gets his first electric popemobile from Mercedes-Benz

Mercedes-Benz has delivered the first all-electric popemobile to the Vatican: a modified version of the German automaker’s...

Spotify users are disappointed by an underwhelming Wrapped this year

After weeks of anticipation, some Spotify users are left underwhelmed by the streamer’s personalized year-in-review feature, Spotify...

Threads users can now follow profiles from other fediverse servers

A new update from Meta’s X competitor Instagram Threads allows users to connect more with the fediverse,...