OpenAI launches o3-mini, its latest ‘reasoning’ model

Date:

Share post:


OpenAI on Friday launched a new AI “reasoning” model, o3-mini, the newest in the company’s o family of reasoning models.

OpenAI first previewed the model in December alongside a more capable system called o3, but the launch comes at a pivotal moment for the company, whose ambitions — and challenges — are seemingly growing by the day.

OpenAI is battling the perception that it’s ceding ground in the AI race to Chinese companies like DeepSeek, which OpenAI alleges might have stolen its IP. It has been trying to shore up its relationship with Washington as it simultaneously pursues an ambitious data center project, and as it reportedly lays the groundwork for one of the largest funding rounds in history.

Which brings us to o3-mini. OpenAI is pitching its new model as both “powerful” and “affordable.”

“Today’s launch marks […] an important step toward broadening accessibility to advanced AI in service of our mission,” an OpenAI spokesperson told TechCrunch.

More efficient reasoning

Unlike most large language models, reasoning models like o3-mini thoroughly fact-check themselves before giving out results. This helps them avoid some of the pitfalls that normally trip up models. These reasoning models do take a little longer to arrive at solutions, but the trade-off is that they tend to be more reliable — though not perfect — in domains like physics.

O3-mini is fine-tuned for STEM problems, specifically for programming, math, and science. OpenAI claims the model is largely on par with the o1 family, o1 and o1-mini, in terms of capabilities, but runs faster and costs less.

The company claimed that external testers preferred o3-mini’s answers over those from o1-mini more than half the time. O3-mini apparently also made 39% fewer “major mistakes” on “tough real-world questions” in A/B tests versus o1-mini, and produced “clearer” responses while delivering answers about 24% faster.

O3-mini will be available to all users via ChatGPT starting Friday, but users who pay for OpenAI’s ChatGPT Plus and Team plans will get a higher rate limit of 150 queries per day. ChatGPT Pro subscribers will get unlimited access, and o3-mini will come to ChatGPT Enterprise and ChatGPT Edu customers in a week. (No word on ChatGPT Gov yet).

Users with premium plans can select o3-mini using the ChatGPT drop-down menu. Free users can click or tap the new “Reason” button in the chat bar, or have ChatGPT “re-generate” an answer.

Beginning Friday, o3-mini will also be available via OpenAI’s API to select developers, but it initially will not have support for analyzing images. Devs can select the level of “reasoning effort” (low, medium, or high) to get o3-mini to “think harder” based on their use case and latency needs.

O3-mini is priced at $0.55 per million cached input tokens and $4.40 per million output tokens, where a million tokens equates to roughly 750,000 words. That’s 63% cheaper than o1-mini, and competitive with DeepSeek’s R1 reasoning model pricing. DeepSeek charges $0.14 per million cached input tokens and $2.19 per million output tokens for R1 access through its API.

In ChatGPT, o3-mini is set to medium reasoning effort, which OpenAI says provides “a balanced trade-off between speed and accuracy.” Paid users will have the option of selecting “o3-mini-high” in the model picker, which will deliver what OpenAI calls “higher intelligence” in exchange for slower responses.

Regardless of which version of o3-mini ChatGPT users choose, the model will work with search to find up-to-date answers with links to relevant web sources. OpenAI cautions that the functionality is a “prototype” as it works to integrate search across its reasoning models.

“While o1 remains our broader general-knowledge reasoning model, o3-mini provides a specialized alternative for technical domains requiring precision and speed,” OpenAI wrote in a blog post on Friday. “The release of o3-mini marks another step in OpenAI’s mission to push the boundaries of cost-effective intelligence.”

Caveats abound

O3-mini is not OpenAI’s most powerful model to date, nor does it leapfrog DeepSeek’s R1 reasoning model in every benchmark.

O3-mini beats R1 on AIME 2024, a test that measures how well models understand and respond to complex instructions — but only with high reasoning effort. It also beats R1 on the programming-focused test SWE-bench Verified (by .1 point), but again, only with high reasoning effort. On low reasoning effort, o3-mini lags R1 on GPQA Diamond, which tests models with PhD-level physics, biology, and chemistry questions.

To be fair, o3-mini answers many queries at competitively low cost and latency. In the post, OpenAI compares its performance to the o1 family:

“With low reasoning effort, o3-mini achieves comparable performance with o1-mini, while with medium effort, o3-mini achieves comparable performance with o1,” OpenAI writes. “O3-mini with medium reasoning effort matches o1’s performance in math, coding and science while delivering faster responses. Meanwhile, with high reasoning effort, o3-mini outperforms both o1-mini and o1.”

It’s worth noting that o3-mini’s performance advantage over o1 is slim in some areas. On AIME 2024, o3-mini beats o1 by just 0.3 percentage points when set to high reasoning effort. And on GPQA Diamond, o3-mini doesn’t surpass o1’s score even on high reasoning effort.

OpenAI asserts that o3-mini is as “safe” or safer than the o1 family, however, thanks to red-teaming efforts and its “deliberative alignment” methodology, which makes models “think” about OpenAI’s safety policy while they’re responding to queries. According to the company, o3-mini “significantly surpasses” one of OpenAI’s flagship models, GPT-4o, on “challenging safety and jailbreak evaluations.”

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Google wants Search to be more like an AI assistant in 2025

Google Search is in the midst of a “journey” around AI, Google CEO Sundar Pichai said during...

Brightpick’s new warehouse robot can reach 20-foot-high shelves

Brightpick, a maker of autonomous mobile robots, on Tuesday announced a lofty addition to its current line....

Alphabet praises DeepSeek, but it’s massively ramping up its AI spending

Booming AI budgets seemed at risk last week when DeepSeek crashed Nvidia’s stock based on speculation that its cheaper AI...

AMD pulls up the release of its next-gen data center GPUs

AMD says that it plans to launch its next major data center GPUs, the AMD Instinct MI350...

Google removes pledge to not use AI for weapons from website

Google removed a pledge to not build AI for weapons or surveillance from its website this week....

Spyware maker Paragon confirms US government is a customer

Israeli spyware maker Paragon Solutions confirmed to TechCrunch that it sells its products to the U.S. government...

Amazon continues renewable energy spree with 476 MW purchase

Renewables notched another win as Amazon signed contracts to buy 476 megawatts of wind and solar on...

E-fuels startup will make diamonds before powering jet planes

September 11 left a lasting impression on Stephen Beaton, and like many others of his generation, he...