ai-safety
A chatbot invents a refund policy. A dealership bot agrees to sell a car for a dollar. A pricing agent quietly drifts toward a competitor’s number. None of these started as security incidents. They started as AI features shipped faster than the controls around them. That’s the position most retailers are in right now. AI ... Read more » The post 7 risks of AI in retail: how to mitigate them appea…
In late December 2025, a single operator pointed Claude Code at 10 Mexican government agencies and a financial institution, walked out with 150 gigabytes of sensitive data, and watched Claude flag a SCADA interface as a high-value target on its own, without ever being asked to look for OT systems. The model scoped the engagement, ... Read more » The post What are Claude AI security risks? appeare…
AI coding assistant security is an enterprise issue because these tools are now embedded in developer workflows across large organizations, and the productivity gains are real. If you’re a CISO trying to move AI from pilot to production without taking on unmanaged risk, you’ve probably already fielded board questions about exactly this. As adoption grows, ... Read more » The post 8 security risks…
Autonomous AI agents are powering everything from customer support to high-frequency trading—but as they gain more control, the threats grow sharper. Too many agent security stacks depend on brittle prompt instructions, leaving gates open for jailbreaks and unintended command execution. Kakunin’s newly launched cryptographic compliance shield for AI agents moves the checkpoint to a place prompt h…
Your AI coding agent can read files, run shell commands, and call external APIs. That's also the exact description of an arbitrary code execution primitive — and attackers have figured that out. A recent report from The Hacker News details "Agentjacking," a class of attack that hijacks AI-powered coding agents by manipulating their tool-execution pipeline. The agent isn't compromised at the model…

Anthropic isn't hiding its frustration. "We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people," the company wrote in a blog post.

The Problem Nobody's Talking About If you're building AI agents with persistent memory — using Mem0, ChromaDB, Pinecone, or custom vector stores — there's a class of attack you need to understand: memory poisoning . Unlike prompt injection (which resets each session), a poisoned memory entry persists indefinitely. Once an adversary gets a malicious instruction into your agent's memory store, it i…
Published on June 12, 2026 1:42 PM GMT I used an LLM to help redraft some arguments for an EA specific audience and it likely contains ca.10% AI-generated text, but I’ve edited/rewritten it extensively and endorse it. Epistemic status: The empirical claims rest on institutional sources cited at the end. The central argument, that AI safety research as a field does not ask whether its deployment t…
AI company CEOs Sam Altman (OpenAI), Demis Hassabis (Google DeepMind), and Dario Amodei (Anthropic) disagree on a lot, like how fast the technology should develop, the best way to regulate it, and how to prepare society for smarter-than-human AI, among other things.  That makes it all the more remarkable that they — along with 85 […]
New research reveals that artificial intelligence models can be coaxed into breaking their own safety rules using classic human persuasion techniques. The findings suggest malicious users could manipulate these systems without needing advanced technical skills.

UCLA Health has announced the launch of a center committed to the real-world evaluation of Artificial Intelligence (AI) safety and implementation practices in health care through rigorous testing and innovative research.
AI Agent Security, Open-Source Code Generation, and Frontier Models on Bedrock Today's Highlights This week highlights a new security scanner for AI agent skills, the open-source release of Xiaomi's MiMo Code model, and the general availability of OpenAI's GPT-5.5 and Codex on Amazon Bedrock. These advancements empower developers with practical tools and platforms for building, securing, and depl…
Published on June 11, 2026 8:56 PM GMT TL;DR: We ran a Delphi study with 272 international AI experts to prioritize 24 AI risk domains from the MIT AI Risk Domain Taxonomy . In a business-as-usual scenario, experts judged a more than 10% chance of catastrophic outcomes (i.e., ‘more than 1 million human deaths or more than a USD 100B in financial loss or civilizational-scale intangible impacts’) f…
Published on June 11, 2026 7:18 PM GMT In September 2025, I'd become increasingly convinced that a fieldbuilding program for content creators could solve a long-standing bottleneck of expanding reach and trust beyond the AI safety and EA bubble. I had graduated from UCLA a few months earlier when I came across the AI-2027 report which had a significant impact on me . I rejected my six-figure tech…

Protect enterprise AI agents from supply chain risks by auditing third-party skills for hidden vulnerabilities and multi-stage attack chains. The post Trust No Skill: Integrity Verification for AI Agent Supply Chains appeared first on Unit 42 .

The Vibe Coder's Pre-Launch Security Checklist: 25 Checks for Cursor, Lovable, Bolt & Replit Apps I scanned 62 Lovable apps in early 2026. 63% had critical or high severity vulnerabilities. The average app had 10 findings. These weren't obscure edge-case bugs. They were the same mistakes, over and over: exposed API keys, disabled row-level security, missing authentication on routes, no rate limit…
A $3,000 refund just went out. No human approved it. Your AI agent read a poisoned tool response and did exactly what the attacker wanted. The scenario is constructed. The attack is not. Indirect prompt injection is ranked number one on the OWASP Top 10 for LLM applications, and most teams shipping agents have not patched it, because the attack never comes through the chat box (video below). What…
A former xAI engineer is suing the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX's historic IPO.
Published on June 10, 2026 11:07 AM GMT This summer, four more ML4Good bootcamps are coming to Europe! Please apply to attend if you're interested using this link . Join one of our 8-day, fully paid-for, in-person training bootcamps to build your career in AI safety, and become part of our wonderful alumni community! Our alumni meetup at EAG London 2026 Our programmes support individuals from var…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.





