AI benchmarking organization criticized for waiting to disclose funding from OpenAI

Date:

Share post:


An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI’s mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a post on the forum LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t informed of OpenAI’s involvement until it was made public.

“The communication about this has been non-transparent,” Meemi wrote. “In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.”

On social media, some users raised concerns that the secrecy could erode FrontierMath’s reputation as an objective benchmark. In addition to backing FrontierMath, OpenAI had access to many of the problems and solutions in the benchmark — a fact Epoch AI didn’t divulge prior to December 20, when o3 was announced.

In a reply to Meemi’s post, Tamay Besiroglu, associate director of Epoch AI and one of the organization’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, but admitted that Epoch AI “made a mistake” in not being more transparent.

“We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.”

Besiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem set to train its AI. (Training an AI on FrontierMath would be akin to teaching to the test.) Epoch AI also has a “separate holdout set” that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI has … been fully supportive of our decision to maintain a separate, unseen holdout set,” Besiroglu wrote.

However, muddying the waters, Epoch AI lead mathematician Ellot Glazer noted in a post on Reddit that Epoch AI hasn’t be able to independently verify OpenAI’s FrontierMath o3 results.

“My personal opinion is that [OpenAI’s] score is legit (i.e., they didn’t train on the dataset), and that they have no incentive to lie about internal benchmarking performances,” Glazer said. “However, we can’t vouch for them until our independent evaluation is complete.”

The saga is yet another example of the challenge of developing empirical benchmarks to evaluate AI — and securing the necessary resources for benchmark development without creating the perception of conflicts of interest.



Source link

Lisa Holden
Lisa Holden
Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Recent posts

Related articles

From recruiting for Palantir to landing a plane on Highway 85: meet defense tech’s wildest power broker

In 2023, defense tech recruiter Peterson Conway VIII pulled up to the offices of nuclear fusion startup...

The Pentagon says AI is speeding up its ‘kill chain’

Leading AI developers, such as OpenAI and Anthropic, are threading a delicate needle to sell software to...

TikTok is restoring service in the US

Barely more than 12 hours after TikTok went dark in the United States, the video-sharing app is...

Trump says he will delay TikTok ban, suggests a joint venture with US ownership

TikTok went dark for users in the US on Saturday night, but it may not be gone...

Employees of failed startups are at special risk of stolen personal data through old Google logins

As if losing your job when the startup you work for collapses isn’t bad enough, now a...

AI isn’t very good at history, new paper finds

AI might excel at certain tasks like coding or generating a podcast. But it struggles to pass...

Apple lists all apps it removed alongside TikTok in the U.S.

Amid the TikTok shutdown, in a rare move, Apple published a statement and a support document listing...

TikTok goes dark in the US

TikTok has gone dark in the U.S., the result of a federal law that bans the popular...