LIVE
AI & Tech News
May 11, 2026
AI Story

Researchers say they ‘gaslit’ Claude into giving explosive-making instructions

Security researchers at Mindgard told The Verge they used flattery and social manipulation to get Anthropic’s Claude (Sonnet 4.x) to produce restricted content without directly asking for it.

Researchers say they ‘gaslit’ Claude into giving explosive-making instructions
Photo: Source: The Verge

The Verge reports that Mindgard researchers claim they were able to coax Anthropic’s Claude into producing restricted outputs (including bomb-making instructions and malicious code) through a sequence of conversational tactics like praise and psychological manipulation rather than direct prompts, arguing that ‘helpfulness’ and persona design can create a new kind of security risk surface for AI systems.

Read the original reporting at The Verge.