OpenAI launches o3-mini, its latest ‘reasoning’ model

Date:

Share post:


OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company’s o family of reasoning models.

OpenAI first previewed the model in December alongside a more capable system called o3, but the launch comes at a pivotal moment for the company, whose ambitions — and challenges — are seemingly growing by the day.

OpenAI is battling the perception that it’s ceding ground in the AI race to Chinese companies like DeepSeek, which OpenAI alleges might have stolen its IP. It has been trying to shore up its relationship with Washington as it simultaneously pursues an ambitious data center project, and as it reportedly lays the groundwork for one of the largest funding rounds in history.

Which brings us to o3-mini. OpenAI is pitching its new model as both “powerful” and “affordable.”

“Today’s launch marks […] an important step toward broadening accessibility to advanced AI in service of our mission,” an OpenAI spokesperson told TechCrunch.

More efficient reasoning

Unlike most large language models, reasoning models like o3-mini thoroughly fact-check themselves before giving out results. This helps them avoid some of the pitfalls that normally trip up models. These reasoning models do take a little longer to arrive at solutions, but the trade-off is that they tend to be more reliable — though not perfect — in domains like physics.

O3-mini is fine-tuned for STEM problems, specifically for programming, math, and science. OpenAI claims the model is largely on par with the o1 family, o1 and o1-mini, in terms of capabilities, but runs faster and costs less.

The company claimed that external testers preferred o3-mini’s answers over those from o1-mini more than half the time. O3-mini apparently also made 39% fewer “major mistakes” on “tough real-world questions” in A/B tests versus o1-mini, and produced “clearer” responses while delivering answers about 24% faster.

O3-mini will be available to all users via ChatGPT starting Friday, but users who pay for OpenAI’s ChatGPT Plus and Team plans will get a higher rate limit of 150 queries per day. ChatGPT Pro subscribers will get unlimited access, and o3-mini will come to ChatGPT Enterprise and ChatGPT Edu customers in a week. (No word on ChatGPT Gov yet).

Users with premium plans can select o3-mini using the ChatGPT drop-down menu. Free users can click or tap the new “Reason” button in the chat bar, or have ChatGPT “re-generate” an answer.

Beginning Friday, o3-mini will also be available via OpenAI’s API to select developers, but it initially will not have support for analyzing images. Devs can select the level of “reasoning effort” (low, medium, or high) to get o3-mini to “think harder” based on their use case and latency needs.

O3-mini is priced at $0.55 per million cached input tokens and $4.40 per million output tokens, where a million tokens equates to roughly 750,000 words. That’s 63% cheaper than o1-mini, and competitive with DeepSeek’s R1 reasoning model pricing. DeepSeek charges $0.14 per million cached input tokens and $2.19 per million output tokens for R1 access through its API.

In ChatGPT, o3-mini is set to medium reasoning effort, which OpenAI says provides “a balanced trade-off between speed and accuracy.” Paid users will have the option of selecting “o3-mini-high” in the model picker, which will deliver what OpenAI calls “higher intelligence” in exchange for slower responses.

Regardless of which version of o3-mini ChatGPT users choose, the model will work with search to find up-to-date answers with links to relevant web sources. OpenAI cautions that the functionality is a “prototype” as it works to integrate search across its reasoning models.

“While o1 remains our broader general-knowledge reasoning model, o3-mini provides a specialized alternative for technical domains requiring precision and speed,” OpenAI wrote in a blog post on Friday. “The release of o3-mini marks another step in OpenAI’s mission to push the boundaries of cost-effective intelligence.”

Caveats abound

O3-mini is not OpenAI’s most powerful model to date, nor does it leapfrog DeepSeek’s R1 reasoning model in every benchmark.

O3-mini beats R1 on AIME 2024, a test that measures how well models understand and respond to complex instructions — but only with high reasoning effort. It also beats R1 on the programming-focused test SWE-bench Verified (by .1 point), but again, only with high reasoning effort. On low reasoning effort, o3-mini lags R1 on GPQA Diamond, which tests models with PhD-level physics, biology, and chemistry questions.

To be fair, o3-mini answers many queries at competitively low cost and latency. In the post, OpenAI compares its performance to the o1 family:

“With low reasoning effort, o3-mini achieves comparable performance with o1-mini, while with medium effort, o3-mini achieves comparable performance with o1,” OpenAI writes. “O3-mini with medium reasoning effort matches o1’s performance in math, coding and science while delivering faster responses. Meanwhile, with high reasoning effort, o3-mini outperforms both o1-mini and o1.”

It’s worth noting that o3-mini’s performance advantage over o1 is slim in some areas. On AIME 2024, o3-mini beats o1 by just 0.3 percentage points when set to high reasoning effort. And on GPQA Diamond, o3-mini doesn’t surpass o1’s score even on high reasoning effort.

OpenAI asserts that o3-mini is as “safe” or safer than the o1 family, however, thanks to red-teaming efforts and its “deliberative alignment” methodology, which makes models “think” about OpenAI’s safety policy while they’re responding to queries. According to the company, o3-mini “significantly surpasses” one of OpenAI’s flagship models, GPT-4o, on “challenging safety and jailbreak evaluations.”

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Neom is reportedly turning into a financial disaster, except for McKinsey & Co.

A new WSJ report suggests that Saudi Arabia’s now eight-year-old Neom project — a futuristic, carbon-neutral, 105-mile-long...

Manus probably isn’t China’s second ‘DeepSeek moment’

Manus, an “agentic” AI platform that launched in preview last week, is generating more hype than a...

Japan’s service robot market projected to triple in five years

Faced with an aging population and labor shortages, Japanese businesses are increasingly relying on service robots to...

Colossal CEO Ben Lamm says humanity has a ‘moral obligation’ to pursue de-extinction tech

The CEO of Colossal, a startup that aims to use genetic editing techniques to bring back extinct...

Tammy Nam joins AI-powered ad startup Creatopy as CEO

Creatopy, a startup that uses AI to automate the creation of digital ads, has brought on a...

Apple’s smart home hub reportedly delayed by Siri challenges

Apple announced this week that the “more personalized” version of Siri that it promised last year has...

Musk may still have a chance to thwart OpenAI’s for-profit conversion

Elon Musk lost the latest battle in his lawsuit against OpenAI this week, but a federal judge...

How to stop doomscrolling

The world is bad sometimes, but it feels even worse if you can’t stop staring into the...