OpenAI’s agent tool may be nearing release

Date:

Share post:


OpenAI may be close to releasing an AI tool that can take control of your PC and perform actions on your behalf.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claims to have uncovered evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, which is said to be an “agentic” system capable of autonomously handling tasks like writing code and booking travel.

According to The Information, OpenAI is targeting January as Operator’s release month. Code uncovered by Blaho this weekend adds credence to that reporting.

OpenAI’s ChatGPT client for macOS has gained options, hidden for now, to define shortcuts to “Toggle Operator” and “Force Quit Operator,” per Blaho. And OpenAI has added references to Operator on its website, Blaho said — albeit references that aren’t yet publicly visible.

According to Blaho, OpenAI’s site also contains not-yet-public tables comparing the performance of Operator to other computer-using AI systems. The tables may well be placeholders. But if the numbers are accurate, they suggest that Operator isn’t 100% reliable, depending on the task.

On OSWorld, a benchmark that tries to mimic a real computer environment, “OpenAI Computer Use Agent (CUA)” — possibly the AI model powering Operator — scores 38.1%, ahead of Anthropic’s computer-controlling model but well short of the 72.4% humans score. OpenAI CUA surpases human performance on WebVoyager, which evaluates an AI’s ability to navigate and interact with websites. But the model falls short of human-level scores on another web-based benchmark, WebArena, according to the leaked benchmarks.

Operator also struggles with tasks a human could perform easily, if the leak is to be believed. In a test that tasked Operator with signing up with a cloud provider and launching a virtual machine, Operator was only successful 60% of the time. Tasked with creating a Bitcoin wallet, Operator succeeded only 10% of the time.

OpenAI’s imminent entry into the AI agent space comes as rivals including the aforementioned Anthropic, Google, and others make plays for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. According to analytics firm Markets and Markets, the market for AI agents could be worth $47.1 billion by 2030.

Agents today are rather primitive. But some experts have raised concerns about their safety, should the technology rapidly improve.

One of the leaked charts shows Operator performing well on selected safety evaluations, including tests that try to get the system to perform “illicit activities” and search for “sensitive personal data.” Reportedly, safety testing is among the reasons for Operator’s long development cycle. In a recent X post, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent he claims lacks safety mitigations.

“I can only imagine the negative reactions if OpenAI made a similar release,” Zaremba wrote.

It’s worth noting that OpenAI has been criticized by AI researchers, including ex-staff, for allegedly de-emphasizing safety work in favor of quickly productizing its technology.





Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

Karmen secures $9.4 million for its revenue-based financing products

French startup Karmen has secured a small funding round so that it can improve its instant financing...

President Trump signs exec order to make Musk’s DOGE commission more official

The Department of Government Efficiency (DOGE), an advisory commission spearheaded by billionaire Elon Musk recommending deep cuts...

Trump signs exec order delaying TikTok enforcement action for 75 days

President Donald Trump has signed an executive order aimed at restoring TikTok service in the U.S. The order...

President Trump repeals Biden’s AI executive order

During his first day in office, President Donald Trump revoked a 2023 executive order signed by former...

UK to unveil ‘Humphrey’ assistant for civil servants with other AI plans to cut bureaucracy

A week after the U.K. government announced a sweeping plan to make big investments into AI, it’s...

Friend delays shipments of its AI companion pendant

Friend, a startup creating a $99, AI-powered necklace designed to be treated as a digital companion, has...

At the Microsoft Excel World Championship, selfies and a ‘hype’ tunnel

An arena. A hype tunnel, the kind through which NBA players typically streak. A competitor dressed in...

Flipboard’s new app Surf adds its own video feed, too

After the TikTok ban went into effect on Sunday, social network Bluesky launched a custom feed for...