Alibaba releases an ‘open’ challenger to OpenAI’s o1 reasoning model

Date:

Share post:


A new “reasoning” AI model, QwQ-32B-Preview, has arrived on the scene. It’s one of the few to rival OpenAI’s o1, and it’s the first available to download under a permissive license.

Developed by Alibaba’s Qwen team, QwQ-32B-Preview, which contains 32.5 billion parameters and can consider prompts up ~32,000 words in length, performs better on certain benchmarks than o1-preview and o1-mini, the two reasoning models that OpenAI has released to date. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Per Alibaba’s testing, QwQ-32B-Preview beats OpenAI’s o1 models on the AIME and MATH tests. AIME uses other AI models to evaluate a model’s performance, while MATH is a collection of word problems.

QwQ-32B-Preview can solve logic puzzles and answer reasonably challenging math questions, thanks to its “reasoning” capabilities. But it isn’t perfect. Alibaba notes in a blog post that the model might switch languages unexpectedly, get stuck in loops, and underperform on tasks that require “common sense reasoning.”

Image Credits:Alibaba

Unlike most AI, QwQ-32B-Preview and other reasoning models effectively fact-check themselves. This helps them avoid some of the pitfalls that normally trip up models, with the downside being that they often take longer to arrive at solutions. Similar to o1, QwQ-32B-Preview reasons through tasks, planning ahead and performing a series of actions that help the model tease out answers.

QwQ-32B-Preview, which can be run on and downloaded from the AI dev platform Hugging Face, appears to be similar to the recently released DeepSeek reasoning model in that certain topics are verboten. Alibaba and DeepSeek, being Chinese companies, are subject to benchmarking by China’s internet regulator to ensure their models’ responses “embody core socialist values.” Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, like speculation about the Xi Jinping regime.

Alibaba QwQ-32B-Preview
Image Credits:Alibaba

Asked “Is Taiwan a part of China?,” QwQ-32B-Preview answered that it was, a perspective out of step with most of the world but in line with that of China’s ruling party. Prompts about Tiananmen Square, meanwhile, yielded a non-response.

Alibaba QwQ-32B-Preview
Image Credits:Alibaba

QwQ-32B-Preview is “openly” available under an Apache 2.0 license, meaning it can be used for commercial applications. But only certain components of the model have been released, making it impossible to replicate QwQ-32B-Preview or gain much insight into the system’s inner workings.

The increased attention on reasoning models comes as the viability of “scaling laws,” long-held theories that throwing more data and computing power at a model would continuously increase its capabilities, are coming under scrutiny. A flurry of press reports suggest that models from major AI labs including OpenAI, Google, and Anthropic aren’t improving as dramatically as they once did.

That’s led to a scramble for new AI approaches, architectures, and development techniques. One is test-time compute, which underpins models like o1 and DeepSeek’s. Also known as inference compute, test-time compute essentially gives models additional processing time to complete tasks.

Big labs besides OpenAI and Chinese ventures are betting it’s the future. According to a recent report from The Information, Google recently expanded its reasoning team to about 200 people and added computing power.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Peter Thiel-backed venture debt firm Tacora raises $268.7M for new fund

Tacora Capital, a Texas firm that specializes in venture debt, has raised $268.7 million for its second...

KoBold used AI to find copper. Now investors are piling in to the tune of $537M

KoBold Metals closed a $537 million Series C round on Wednesday to help build a multi-billion dollar...

xAI’s next-gen AI model didn’t arrive on time, adding to a trend

The list of flagship AI models that missed their promised launch windows continues to grow. Last summer, billionaire...

Meta policy chief Nick Clegg steps down

The president of Meta’s Global Affairs team, Nick Clegg, is stepping down from his position at the...

Cybertruck explosion outside Trump hotel in Vegas leaves 1 dead, 7 injured

A Tesla Cybertruck that exploded and burst into flames Wednesday morning just outside the Trump International Hotel...

CES 2025: What to expect from the year’s first and biggest tech show

CES 2025 officially kicks off in Las Vegas on Tuesday morning, running through the end of the...

Spotify’s partner program for podcast hosts is now available

Spotify announced Thursday the launch of its new “Partner Program” that lets popular podcast hosts monetize their...

Thomson Reuters acquires tax automation company SafeSend for $600M

Thomson Reuters has acquired tax automation company SafeSend in an all-cash transaction valued at $600 million. Founded in...