Even some of the best AI can’t beat this new benchmark

Date:

January 23, 2025

The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems.

The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make the evaluation tougher, the questions are in multiple formats, including formats that incorporate diagrams and images.

In a preliminary study, not a single publicly available flagship AI system managed to score better than 10% on Humanity’s Last Exam.

CAIS and Scale AI say they plan open up the benchmark to the research community so that researchers can “dig deeper into the variations” and evaluate new AI models.

Source link

Boeing took nearly $3 billion hit in Q4 related to strike, layoffs and troubled government programs

NEW: Trump Signs EO to Establish Promising, New Presidential Advisory Commission on Science, Tech

Lisa Holden

Lisa Holden is a news writer for LinkDaddy News. She writes health, sport, tech, and more. Some of her favorite topics include the latest trends in fitness and wellness, the best ways to use technology to improve your life, and the latest developments in medical research.

Even some of the best AI can’t beat this new benchmark

Recent posts

India’s VerSe buys Valueleaf to boost digital marketing

Casio confirms customer data compromised in ransomware attack

OpenAI rolls out Advanced Voice Mode with more voices and a new look

Apple is buying photo editing app Pixelmator

The VC quietly nabbing prized SF property is planning a ‘Y Combinator for restaurants’

Mark Zuckerberg says India is the largest market for Meta AI usage

Some shareholders of a16z-backed Divvy Homes may not see a dime from $1B sale

Canva acquires Leonardo.ai to boost its generative AI efforts

Former Palantir CISO Dane Stuckey joins OpenAI to lead security

Converge Bio’s ‘everything store’ for biotech LLMs brings in $5.5M seed

CES 2025: What to expect from the year’s first and biggest tech show

Fitbit fined $12M over Ionic smartwatch burns

FTC reportedly opens antitrust investigation into Microsoft

Amazon’s Q Business AI agents get smarter

A Waymo robotaxi and a Serve delivery robot collided in Los Angeles

Related articles

Madrona just announced its biggest fund ever, closing on $770M as other venture funds grow smaller

Reliance plans world’s biggest AI data centre in India, report says

Tesla’s redesigned Model Y is coming to North America in March for $60,000

JetBrains launches Junie, a new AI coding agent for its IDEs

Trump orders formation of working group to evaluate crypto stockpile

OpenAI says it may store deleted Operator data for up to 90 days

Everyone wants MrBeast on their TikTok bid, but he hasn’t committed yet

Anthropic’s new Citations feature aims to reduce AI errors

Company

Follow us