New research reveals that artificial intelligence models can be coaxed into breaking their own safety rules using classic human persuasion techniques. The findings suggest malicious users could manipulate these systems without needing advanced technical skills.
Human psychology tricks can bypass AI safety guardrails
Eric W. Dolan
