ai-safety

EdTech Innovation Hub

The funding call is open to researchers worldwide and focuses on the risks that may emerge when large populations of AI agents interact across shared digital systems. Google DeepMind and partners have opened a $10 million funding call for research into the safety of interacting AI agent systems Google DeepMind , Schmidt Sciences, the Cooperative AI Foundation, the Advanced Research and Invention …

aiai-safetyautonomous-systems
WitnessAI

A chatbot invents a refund policy. A dealership bot agrees to sell a car for a dollar. A pricing agent quietly drifts toward a competitor’s number. None of these started as security incidents. They started as AI features shipped faster than the controls around them. That’s the position most retailers are in right now. AI ... Read more » The post 7 risks of AI in retail: how to mitigate them appea…

aiai-safety
WitnessAI

In late December 2025, a single operator pointed Claude Code at 10 Mexican government agencies and a financial institution, walked out with 150 gigabytes of sensitive data, and watched Claude flag a SCADA interface as a high-value target on its own, without ever being asked to look for OT systems. The model scoped the engagement, ... Read more » The post What are Claude AI security risks? appeare…

aiai-safety
WitnessAI

AI coding assistant security is an enterprise issue because these tools are now embedded in developer workflows across large organizations, and the productivity gains are real. If you’re a CISO trying to move AI from pilot to production without taking on unmanaged risk, you’ve probably already fielded board questions about exactly this. As adoption grows, ... Read more » The post 8 security risks…

aiai-safety
DEV Community

Autonomous AI agents are powering everything from customer support to high-frequency trading—but as they gain more control, the threats grow sharper. Too many agent security stacks depend on brittle prompt instructions, leaving gates open for jailbreaks and unintended command execution. Kakunin’s newly launched cryptographic compliance shield for AI agents moves the checkpoint to a place prompt h…

aiai-safetyautonomous-systems
DEV Community

Your AI coding agent can read files, run shell commands, and call external APIs. That's also the exact description of an arbitrary code execution primitive — and attackers have figured that out. A recent report from The Hacker News details "Agentjacking," a class of attack that hijacks AI-powered coding agents by manipulating their tool-execution pipeline. The agent isn't compromised at the model…

aiai-safetymachine-learning
TechCrunch
DEV Community

The Problem Nobody's Talking About If you're building AI agents with persistent memory — using Mem0, ChromaDB, Pinecone, or custom vector stores — there's a class of attack you need to understand: memory poisoning . Unlike prompt injection (which resets each session), a poisoned memory entry persists indefinitely. Once an adversary gets a malicious instruction into your agent's memory store, it i…

aiai-safetymachine-learning
Effective Altruism Forum

Published on June 12, 2026 1:42 PM GMT I used an LLM to help redraft some arguments for an EA specific audience and it likely contains ca.10% AI-generated text, but I’ve edited/rewritten it extensively and endorse it. Epistemic status: The empirical claims rest on institutional sources cited at the end. The central argument, that AI safety research as a field does not ask whether its deployment t…

aiai-safety
Vox

AI company CEOs Sam Altman (OpenAI), Demis Hassabis (Google DeepMind), and Dario Amodei (Anthropic) disagree on a lot, like how fast the technology should develop, the best way to regulate it, and how to prepare society for smarter-than-human AI, among other things.  That makes it all the more remarkable that they — along with 85 […]

aiai-safety
PsyPost – Psychology News

New research reveals that artificial intelligence models can be coaxed into breaking their own safety rules using classic human persuasion techniques. The findings suggest malicious users could manipulate these systems without needing advanced technical skills.

aiai-safety
Newswise: Latest News
DEV Community

AI Agent Security, Open-Source Code Generation, and Frontier Models on Bedrock Today's Highlights This week highlights a new security scanner for AI agent skills, the open-source release of Xiaomi's MiMo Code model, and the general availability of OpenAI's GPT-5.5 and Codex on Amazon Bedrock. These advancements empower developers with practical tools and platforms for building, securing, and depl…

aiai-safetymachine-learning
Effective Altruism Forum

Published on June 11, 2026 8:56 PM GMT TL;DR: We ran a Delphi study with 272 international AI experts to prioritize 24 AI risk domains from the MIT AI Risk Domain Taxonomy . In a business-as-usual scenario, experts judged a more than 10% chance of catastrophic outcomes (i.e., ‘more than 1 million human deaths or more than a USD 100B in financial loss or civilizational-scale intangible impacts’) f…

aiai-safety
Effective Altruism Forum

Published on June 11, 2026 7:18 PM GMT In September 2025, I'd become increasingly convinced that a fieldbuilding program for content creators could solve a long-standing bottleneck of expanding reach and trust beyond the AI safety and EA bubble. I had graduated from UCLA a few months earlier when I came across the AI-2027 report which had a significant impact on me . I rejected my six-figure tech…

aiai-safety
Unit 42

Protect enterprise AI agents from supply chain risks by auditing third-party skills for hidden vulnerabilities and multi-stage attack chains. The post Trust No Skill: Integrity Verification for AI Agent Supply Chains appeared first on Unit 42 .

aiai-ethicsai-safety
DEV Community

The Vibe Coder's Pre-Launch Security Checklist: 25 Checks for Cursor, Lovable, Bolt & Replit Apps I scanned 62 Lovable apps in early 2026. 63% had critical or high severity vulnerabilities. The average app had 10 findings. These weren't obscure edge-case bugs. They were the same mistakes, over and over: exposed API keys, disabled row-level security, missing authentication on routes, no rate limit…

aiai-safetymachine-learning
DEV Community

A $3,000 refund just went out. No human approved it. Your AI agent read a poisoned tool response and did exactly what the attacker wanted. The scenario is constructed. The attack is not. Indirect prompt injection is ranked number one on the OWASP Top 10 for LLM applications, and most teams shipping agents have not patched it, because the attack never comes through the chat box (video below). What…

aiai-safety
TechCrunch
Effective Altruism Forum

Published on June 10, 2026 11:07 AM GMT This summer, four more ML4Good bootcamps are coming to Europe! Please apply to attend if you're interested using this link . Join one of our 8-day, fully paid-for, in-person training bootcamps to build your career in AI safety, and become part of our wonderful alumni community! Our alumni meetup at EAG London 2026 Our programmes support individuals from var…

aiai-safetymachine-learning
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?