TechCrunch Minute: How Anthropic found a trick to get AI to give you answers it’s not supposed to

If you build it, people will try to break it. Sometimes even the people building stuff are the ones breaking it. Such is the case with Anthropic and its latest research which demonstrates an interesting vulnerability in current LLM technology. More or less if you keep at a question, you can break guardrails and wind up with large language models telling you stuff that they are designed not to. Like how to build a bomb.

Of course given progress in open-source AI technology, you can spin up your own LLM locally and just ask it whatever you want, but for more consumer-grade stuff this is an issue worth pondering. What’s fun about AI today is the quick pace it is advancing, and how well — or not — we’re doing as a species to better understand what we’re building.

If you’ll allow me the thought, I wonder if we’re going to see more questions and issues of the type that Anthropic outlines as LLMs and other new AI model types get smarter, and larger. Which is perhaps repeating myself. But the closer we get to more generalized AI intelligence, the more it should resemble a thinking entity, and not a computer that we can program, right? If so, we might have a harder time nailing down edge cases to the point when that work becomes unfeasible? Anyway, let’s talk about what Anthropic recently shared.

Source link

TechCrunch Minute: How Anthropic found a trick to get AI to give you answers it’s not supposed to

Recent posts

How Bret Taylor’s new company is rethinking customer experience in the age of AI

Nvidia could be primed to be the next AWS

Dexa aims to get more out of podcasts with AI-powered search

X warns that you might lose followers as it does another bot sweep

Agility Robotics lays off some staff amid commercialization focus

Nothing’s budget Phone (2a) hits preorder at $349

Ex-NSA hacker and ex-Apple researcher launch startup to protect Apple devices

Gravitics to develop ‘tactically responsive’ orbital platforms for the Space Force

UnitedHealth data breach should be a wakeup call for the UK and NHS

Rocket Lab has ‘misrepresented’ Neutron launch readiness, congressional memo says

FAA completes investigation into SpaceX’s second fiery Starship test

Xaira, an AI drug discovery startup, launches with a massive $1B, says it’s ‘ready’ to start developing drugs

LinkedIn’s new feature nudges users to reach out to people in their network

Shadowfax speeds ahead with $100M funding as instant delivery boom fuels growth

Anduril moves ahead in Pentagon program to develop unmanned fighter jets

Related articles

Senate study proposes ‘at least’ $32B yearly for AI programs

FBI seizes hacking forum BreachForums — again

Netflix to take on Google and Amazon by building its own ad server

Matt Garman taking over as CEO with AWS at crossroads

Google still hasn’t fixed Gemini’s biased image generator

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

The top AI announcements from Google I/O

Uber has a new way to solve the concert traffic problem

Company

Follow us