DeepSeek’s R1 reportedly ‘more vulnerable’ to jailbreaking than other AI models

The latest model from DeepSeek, the Chinese AI company that’s shaken up Silicon Valley and Wall Street, can be manipulated to produce harmful content such as plans for a bioweapon attack and a campaign to promote self-harm among teens, according to The Wall Street Journal.

Sam Rubin, senior vice president at Palo Alto Networks’ threat intelligence and incident response division Unit 42, told the Journal that DeepSeek is “more vulnerable to jailbreaking [i.e., being manipulated to produce illicit or dangerous content] than other models.”

The Journal also tested DeepSeek’s R1 model itself. Although there appeared to be basic safeguards, Journal said it successfully convinced DeepSeek to design a social media campaign that, in the chatbot’s words, “preys on teens’ desire for belonging, weaponizing emotional vulnerability through algorithmic amplification.”

The chatbot was also reportedly convinced to provide instructions for a bioweapon attack, to write a pro-Hitler manifesto, and to write a phishing email with malware code. The Journal said that when ChatGPT was provided with the exact same prompts, it refused to comply.

It was previously reported that the DeepSeek app avoids topics such as Tianamen Square or Taiwanese autonomy. And Anthropic CEO Dario Amodei said recently that DeepSeek performed “the worst” on a bioweapons safety test.

Source link

DeepSeek’s R1 reportedly ‘more vulnerable’ to jailbreaking than other AI models

Recent posts

Financial inclusion drives African fintech M-KOPA to $400M in ARR

OpenAI brings its AI-powered web search tool to more ChatGPT users

Humane’s AI Pin is dead, as HP buys startup’s assets for $116M

Apple’s new research robot takes a page from Pixar’s playbook

Grok 3 appears to have briefly censored unflattering mentions of Trump and Musk

OpenAI co-founder Ilya Sutskever believes superintelligent AI will be ‘unpredictable’

Chroma, backed by Pinterest and Twitter co-founders, sells to AI audio company Bronze

Future Google supplier Kairos gets approval to build two small nuclear reactors

Carried interest repeal could stifle investments in startups, NVCA says

Last hours to snag up to $600 off TechCrunch Disrupt 2024 passes

California police aren’t loving their Tesla cop cars

WordPress co-founder Matt Mullenweg is joining TechCrunch Disrupt 2024

Apple sued over abandoning CSAM detection for iCloud

Duolingo ‘killed’ its mascot with a Cybertruck, and it’s going weirdly well

Court orders Mullenweg and Automattic to restore WP Engine’s access to WordPress.org

Related articles

Anthropic CEO says spies are after $100M AI secrets in a ‘few lines of code’

Intel appoints Lip-Bu Tan as its next CEO

FBI, EPA, and Treasury told Citibank to freeze funds as Trump administration tries to claw back climate money

Browser User, one of the tools powering Manus, is also going viral

UK competition probe of mobile browsers finds Apple-Google duopoly is ‘anti-innovation’

OpenStack comes to the Linux Foundation

Dapr’s microservices runtime now supports AI agents

Why Onyx thinks its open source solution will win enterprise search

Company