U.K. agency releases tools to test AI model safety

The U.K. Safety Institute, the U.K.’s recently established AI safety body, has released a toolset designed to “strengthen AI safety” by making it easier for industry, research organizations and academia to develop AI evaluations.

Called Inspect, the toolset — which is available under an open source license, specifically an MIT License — aims to assess certain capabilities of AI models, including models’ core knowledge and ability to reason, and generate a score based on the results.

In a press release announcing the news on Friday, the Safety Institute claimed that Inspect marks “the first time that an AI safety testing platform which has been spearheaded by a state-backed body has been released for wider use.”

A look at Inspect’s dashboard.

“Successful collaboration on AI safety testing means having a shared, accessible approach to evaluations, and we hope Inspect can be a building block,” Safety Institute chair Ian Hogarth said in a statement. “We hope to see the global AI community using Inspect to not only carry out their own model safety tests, but to help adapt and build upon the open source platform so we can produce high-quality evaluations across the board.”

As we’ve written about before, AI benchmarks are hard — not least of which because the most sophisticated AI models today are black boxes whose infrastructure, training data and other key details are details are kept under wraps by the companies creating them. So how does Inspect tackle the challenge? By being extensible and extendable to new testing techniques, mainly.

Inspect is made up of three basic components: data sets, solvers and scorers. Data sets provide samples for evaluation tests. Solvers do the work of carrying out the tests. And scorers evaluate the work of solvers and aggregate scores from the tests into metrics.

Inspect’s built-in components can be augmented via third-party packages written in Python.

In a post on X, Deborah Raj, a research fellow at Mozilla and noted AI ethicist, called Inspect a “testament to the power of public investment in open source tooling for AI accountability.”

Clément Delangue, CEO of AI startup Hugging Face, floated the idea of integrating Inspect with Hugging Face’s model library or creating a public leaderboard with the results of the toolset’s evaluations.

Inspect’s release comes after a stateside government agency — the National Institute of Standards and Technology (NIST) — launched NIST GenAI, a program to assess various generative AI technologies including text- and image-generating AI. NIST GenAI plans to release benchmarks, help create content authenticity detection systems and encourage the development of software to spot fake or misleading AI-generated information.

In April, the U.S. and U.K. announced a partnership to jointly develop advanced AI model testing, following commitments announced at the U.K.’s AI Safety Summit in Bletchley Park in November of last year. As part of the collaboration, the U.S. intends to launch its own AI safety institute, which will be broadly charged with evaluating risks from AI and generative AI.

Source link

U.K. agency releases tools to test AI model safety

Recent posts

David Sacks reveals Glue, the AI company he’s been teasing on his All In podcast

After a $20M Series A funding, Germany’s Insempra plans eco-friendly lipid production

Spotify offers Car Thing refunds as it faces lawsuit over bricking the streaming device

Business planning startup Pigment raises $145 million round in rare French tech megaround

TechCrunch Mobility: Apple layoffs, an EV price reckoning and another Tesla robotaxi promise

Google rolls out Gemini in Android Studio for coding assistance

Sam Bankman-Fried gets 25 years in prison for fraud and money-laundering at FTX

Apple iPad Pro M4 vs. iPad Air M2: Reviewing which is right for most

After surpassing $100M in ARR, Harness Labs grabs a $150M line of credit

HAX at TechCrunch Early Stage 2024: Empowering hard tech founders

Meta pauses plans to train AI using European users’ data, bowing to regulatory pressure

The UK threw a splashy event in New York this week to woo more American VCs

Pitch Deck Teardown: MegaMod’s $1.9M seed deck

Fisker starts new round of layoffs to ‘preserve cash’

Mastering finance essentials with Mercury’s VP of finance, Dan Kang, at TechCrunch Early Stage

Related articles

AI-powered scams and what you can do about them

Identity.vc is bringing capital and community to Europe’s LGBTQ+ venture ecosystem

Robot cats, dogs and birds are being deployed amid an ‘epidemic of loneliness’

ServiceNow’s generative AI solutions are taking advantage of the data on its own platform

Here are India’s biggest AI startups based on how much money they’ve raised

Defense tech and ‘resilience’ get global funding sources: Here are some top funders

Gemini’s data-analyzing abilities aren’t as good as Google claims

The biggest data breaches in 2024: 1B stolen records and rising

Company

Follow us