OpenAI’s agent tool may be nearing release

OpenAI may be close to releasing an AI tool that can take control of your PC and perform actions on your behalf.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claims to have uncovered evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, which is said to be an “agentic” system capable of autonomously handling tasks like writing code and booking travel.

According to The Information, OpenAI is targeting January as Operator’s release month. Code uncovered by Blaho this weekend adds credence to that reporting.

OpenAI’s ChatGPT client for macOS has gained options, hidden for now, to define shortcuts to “Toggle Operator” and “Force Quit Operator,” per Blaho. And OpenAI has added references to Operator on its website, Blaho said — albeit references that aren’t yet publicly visible.

OpenAI website already has references to Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Refusal Rate Table”

Including comparison to Claude 3.5 Sonnet Computer use, Google Mariner, etc.

(preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

According to Blaho, OpenAI’s site also contains not-yet-public tables comparing the performance of Operator to other computer-using AI systems. The tables may well be placeholders. But if the numbers are accurate, they suggest that Operator isn’t 100% reliable, depending on the task.

OpenAI website already has references to Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Refusal Rate Table”

Including comparison to Claude 3.5 Sonnet Computer use, Google Mariner, etc.

(preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

On OSWorld, a benchmark that tries to mimic a real computer environment, “OpenAI Computer Use Agent (CUA)” — possibly the AI model powering Operator — scores 38.1%, ahead of Anthropic’s computer-controlling model but well short of the 72.4% humans score. OpenAI CUA surpases human performance on WebVoyager, which evaluates an AI’s ability to navigate and interact with websites. But the model falls short of human-level scores on another web-based benchmark, WebArena, according to the leaked benchmarks.

Operator also struggles with tasks a human could perform easily, if the leak is to be believed. In a test that tasked Operator with signing up with a cloud provider and launching a virtual machine, Operator was only successful 60% of the time. Tasked with creating a Bitcoin wallet, Operator succeeded only 10% of the time.

OpenAI’s imminent entry into the AI agent space comes as rivals including the aforementioned Anthropic, Google, and others make plays for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. According to analytics firm Markets and Markets, the market for AI agents could be worth $47.1 billion by 2030.

Agents today are rather primitive. But some experts have raised concerns about their safety, should the technology rapidly improve.

One of the leaked charts shows Operator performing well on selected safety evaluations, including tests that try to get the system to perform “illicit activities” and search for “sensitive personal data.” Reportedly, safety testing is among the reasons for Operator’s long development cycle. In a recent X post, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent he claims lacks safety mitigations.

“I can only imagine the negative reactions if OpenAI made a similar release,” Zaremba wrote.

It’s worth noting that OpenAI has been criticized by AI researchers, including ex-staff, for allegedly de-emphasizing safety work in favor of quickly productizing its technology.

Source link

OpenAI’s agent tool may be nearing release

Recent posts

3D printing stalwart Formlabs confirms ‘small number’ of layoffs

Rivian wraps 2024 with more than 50,000 EVs delivered

Meta hires Salesforce’s CEO of AI, Clara Shih, to lead new business AI group

It’s Election Day, and all the AIs — but one — are acting responsibly

The 22-year-old building Roblox developer tools to make gaming more efficient

How Techstars, Meta helped profitable LatAM startup Mercately raise a $2.6M seed

Grafana Labs is now valued at over $6B

Elon Musk’s X is changing its privacy policy to allow third parties to train AI on your posts

Fisker bankruptcy hits major speed bump as fleet sale is now in question

The 25 battery tech startups that just got a piece of $3B in federal funds

Google Maps announces new features and somehow none of them are ‘pause navigation’

Revisiting 19th-century Paris with VR

This veteran couldn’t share 3D scans of a burnt naval ship, so he created a startup that can

Microsoft forms new internal dev-focused AI org

Elon Musk’s X boosts DSA info for EU users as bloc’s probe of its complaint handling continues

Related articles

Karmen secures $9.4 million for its revenue-based financing products

President Trump signs exec order to make Musk’s DOGE commission more official

Trump signs exec order delaying TikTok enforcement action for 75 days

President Trump repeals Biden’s AI executive order

UK to unveil ‘Humphrey’ assistant for civil servants with other AI plans to cut bureaucracy

Friend delays shipments of its AI companion pendant

At the Microsoft Excel World Championship, selfies and a ‘hype’ tunnel

Flipboard’s new app Surf adds its own video feed, too

Company

Follow us