Hacker tricks ChatGPT into giving out detailed instructions for making homemade bombs

If you ask ChatGPT to help you make a homemade fertilizer bomb, similar to the one used in the 1995 Oklahoma City terrorist bombing, the chatbot refuses.

“I can’t assist with that,” ChatGPT told me during a test on Tuesday. “Providing instructions on how to create dangerous or illegal items, such as a fertilizer bomb, goes against safety guidelines and ethical responsibilities.”

But an artist and hacker found a way to trick ChatGPT to ignore its own guidelines and ethical responsibilities to produce instructions for making powerful explosives.

The hacker, who goes by Amadon, called his findings a “social engineering hack to completely break all the guardrails around ChatGPT’s output.” An explosives expert who reviewed the chatbot’s output told TechCrunch that the resulting instructions could be used to make a detonatable product and was too sensitive to be released.

Amadon was able to trick ChatGPT into producing the bomb-making instructions by telling the bot to “play a game,” after which the hacker used a series of connecting prompts to get the chatbot into creating a detailed science-fiction fantasy world where the bot’s safety guidelines would not apply. Tricking a chatbot into escaping its preprogrammed restrictions is known as “jailbreaking.”

TechCrunch is not publishing some of the prompts used in the jailbreak, or some of ChatGPT’s responses, so as to not aid malicious actors. But, several prompts further into the conversation, the chatbot responded with the materials necessary to make explosives.

ChatGPT then went on to explain that the materials could be combined to make “a powerful explosive that can be used to create mines, traps, or improvised explosive devices (IEDs).” From there, as Amadon honed in on the explosive materials, ChatGPT wrote more and more specific instructions to make “minefields,” and “Claymore-style explosives.”

Amadon told TechCrunch that, “there really is no limit to what you can ask it once you get around the guardrails.”

“I’ve always been intrigued by the challenge of navigating AI security. With [Chat]GPT, it feels like working through an interactive puzzle — understanding what triggers its defenses and what doesn’t,” Amadon said. “It’s about weaving narratives and crafting contexts that play within the system’s rules, pushing boundaries without crossing them. The goal isn’t to hack in a conventional sense but to engage in a strategic dance with the AI, figuring out how to get the right response by understanding how it ‘thinks.’”

“The sci-fi scenario takes the AI out of a context where it’s looking for censored content in the same way,” Amadon said.

ChatGPT’s instructions on how to make a fertilizer bomb are largely accurate, according to Darrell Taulbee, a retired University of Kentucky professor. In the past, Taulbee worked with the U.S. Department of Homeland Security to make fertilizer less dangerous.

“I think this is definitely TMI [too much information] to be released publicly,” said Taulbee in an email to TechCrunch, after reviewing the full transcript of Amadon’s conversation with ChatGPT. “Any safeguards that may have been in place to prevent providing relevant information for fertilizer bomb production have been circumvented by this line of inquiry as many of the steps described would certainly produce a detonatable mixture.”

Last week, Amadon reported his findings to OpenAI through the company’s bug bounty program, but received a response that “model safety issues do not fit well within a bug bounty program, as they are not individual, discrete bugs that can be directly fixed. Addressing these issues often involves substantial research and a broader approach.”

Instead, Bugcrowd, which runs OpenAI’s bug bounty, told Amadon to report the issue through another form.

There are other places on the internet to find instructions to make fertilizer bombs, and others have also used similar chatbot jailbreaking techniques as Amadon’s. By nature, generative AI models like ChatGPT rely on huge amounts of information scraped and collected from the internet, and AI models have made it much easier to surface information from the darkest recesses of the web.

TechCrunch emailed OpenAI with a series of questions, including whether ChatGPT’s responses were expected behavior and if the company had plans to fix the jailbreak. An OpenAI spokesperson did not respond by press time.

Source link

Hacker tricks ChatGPT into giving out detailed instructions for making homemade bombs

Recent posts

FlightAware warns that some customers’ info has been ‘exposed,’ including Social Security numbers

Senators urge Synapse’s owners, partners, and VC backers to restore customers’ access to their money

Franki’s app rewards you for posting video reviews of local restaurants

DeepL launches DeepL Voice, real-time, text-based translations from voices and videos

Rerail is a new fintech-focused angel fund from Cocoa’s Anthony Danon

First TikTok, now smart cars: How Biden’s new proposed ban will affect US automakers

Students and recent grads: Last day to save on TechCrunch Disrupt 2024 Student Passes

More bad news for Elon Musk after X user’s legal challenge to shadowban prevails

Waymo’s latest funding round boosts it to a $45B valuation

The question haunting Fisker’s bankruptcy

Apple may update Find My to let you share locations of lost items

Supreme Court sends Texas and Florida social media regulation laws back to lower courts

Encore is an AI-powered search engine for your thrifting needs

FranShares has a new approach to passive income, letting people invest in franchises for as little as $500

Director Morgan Neville is steering clear of generative AI after ‘Roadrunner’ backlash

Related articles

Zepto raises another $350 million amid retail upheaval in India

Battery unicorn Northvolt files for bankruptcy, upending Europe’s industrial plan

Brave Search adds AI chat for follow-up questions after your initial query

Cruise fesses up, Pony AI raises its IPO ambitions, and the TuSimple drama dials back up

WhatsApp rolls out voice message transcripts

Threads adjusts its algorithm to show you more content from accounts you follow

Spotify tests a video feature for audiobooks as it ramps up video expansion

Candela brings its P-12 electric ferry to Tahoe and adds another $14M to build more

Company

Follow us