Tag: Benchmark

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Every Sunday, NPR host Will Shortz, The New York Times’ crossword puzzle guru, gets to quiz thousands of listeners in a long-running segment...

Even some of the best AI can’t beat this new benchmark

The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services,...

A test for AGI is closer to being solved — but it may be flawed

A well-known test for artificial general intelligence (AGI) is closer to being solved. But the tests’s creators say this points to flaws in...

The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark

Over the past few months, tech execs like Elon Musk have touted the performance of their company’s AI models on a particular benchmark: Chatbot Arena. Maintained...