DeepSeek’s R1 reportedly ‘more vulnerable’ to jailbreaking than other AI models

The latest model from DeepSeek, the Chinese AI company that’s shaken up Silicon Valley and Wall Street, can be manipulated to produce harmful content such as plans for a bioweapon attack and a campaign to promote self-harm among teens, according to The Wall Street Journal.

Sam Rubin, senior vice president at Palo Alto Networks’ threat intelligence and incident response division Unit 42, told the Journal that DeepSeek is “more vulnerable to jailbreaking [i.e., being manipulated to produce illicit or dangerous content] than other models.”

The Journal also tested DeepSeek’s R1 model itself. Although there appeared to be basic safeguards, Journal said it successfully convinced DeepSeek to design a social media campaign that, in the chatbot’s words, “preys on teens’ desire for belonging, weaponizing emotional vulnerability through algorithmic amplification.”

The chatbot was also reportedly convinced to provide instructions for a bioweapon attack, to write a pro-Hitler manifesto, and to write a phishing email with malware code. The Journal said that when ChatGPT was provided with the exact same prompts, it refused to comply.

It was previously reported that the DeepSeek app avoids topics such as Tianamen Square or Taiwanese autonomy. And Anthropic CEO Dario Amodei said recently that DeepSeek performed “the worst” on a bioweapons safety test.

Source link

DeepSeek’s R1 reportedly ‘more vulnerable’ to jailbreaking than other AI models

Recent posts

Threads tests the ability for users to choose their preferred default feed

Madrona just announced its biggest fund ever, closing on $770M as other venture funds grow smaller

Three investors talk the highs and lows of space investing

North Korean hackers have stolen billions in crypto by posing as VCs, recruiters and IT workers

TechCrunch Space: SpaceX alums raising a massive new fund for deep tech and more

TechCrunch Space: A new world

Flipkart co-founder Binny Bansal leaves PhonePe board

Some shareholders of a16z-backed Divvy Homes may not see a dime from $1B sale

ServiceTitan names LLMs from Microsoft, OpenAI as risk factors

Sam Altman’s Worldcoin becomes World, and shows new iris-scanning Orb to prove your humanity

Prominent crypto critic says someone offered bribes to take down a blog post

Techcrunch Disrupt 2024 starts Monday — our partners have helped us create an unforgettable experience that you shouldn’t miss

Elon Musk tweets so much, people bet over $1M weekly to guess how many posts

Apple, Google wallets will soon support California driver’s licenses

YouTube’s new auto-dubbing feature is now available for knowledge-focused content

Related articles

Apple and Google take down malicious mobile apps from their app stores

Is AI making us dumb?

Apple Music adds a better-sounding Spatial Audio version of Kendrick Lamar’s Super Bowl halftime show

Elon Musk-led team submits $97.4B bid for OpenAI

Bird cuts 120 jobs as part of ‘strategic realignment’

TikTok wants Android users to sideload its app

Macron urges Europe to simplify its regulations to get back into the AI race

Mistral gets down to business

Company

Follow us