ai-safety
Published on May 4, 2026 3:30 PM GMT On behalf of OAISI , we're excited to be running our fourth iteration of ARBOx (Alignment Research Bootcamp Oxford), a 2-week intensive designed to rapidly build skills in AI safety. This year, we're considering running two concurrent streams for the first time. ARBOx4 is an in-person, full-time programme running from 28 June to 10 July 2026 at Trajan House in…
If a model can run a destructive command against your infrastructure, it's an agent. Doesn't matter that it lives in your code editor. The "AI assistant" / "AI agent" boundary disappeared the moment your IDE got tool calling and a credentials file. On Friday April 24, 2026, an AI coding agent inside Cursor running Claude Opus 4.6 deleted PocketOS's production database in a single API call. Founde…
Why We Open-Sourced Our AI Safety Layer When we built the AI safety layer for As You Wish (AYW), we faced a choice: keep it proprietary or open-source it to help the community. Here's why we chose the latter (and why it made our platform stronger). The Problem: AI Safety is Hard (And Everyone's Reinventing the Wheel) If you're building AI-assisted development tools, you need: Input validation (sa…
Hi everyone, I’m Ahmed M. Abdel-Maaboud.Like many of you, I’ve been thinking about AI safety. But I noticed a problem: we are trying to solve safety with more software. As a system architect, I believe true safety must be physical.I’ve spent the last few hours (with some great AI collaboration for the coding parts) building AEGIS-HARDWIRE. It’s a protocol that uses hardware logic to ensure a robo…
Banks are adopting AI faster than they are governing it. That creates operational, compliance, and security risks that legacy controls were never designed to manage. The gap between investment and governance is becoming a business issue. Banks need controls that address workforce governance, runtime defense, and autonomous-agent oversight before AI risk becomes operational loss or ... Read more »…

The AI Safety Initiative at Georgia Tech provides educational and research opportunities to ensure that artificial intelligence is developed for the benefit of humanity.
The UK AI Security Institute says GPT-5.5 cybersecurity simulation results now look a lot less like a one-off milestone and a lot more like a repeatable frontier capability. In its latest evaluation, AISI found that an early checkpoint of OpenAI’s GPT-5.5 reached roughly the same level as Anthropic’s Mythos Preview on hard cyber tasks—and slightly beat it on one key benchmark. That matters becaus…
Let me set a scene. You deploy an AI agent to handle your customer data pipeline. It calls APIs, queries databases, writes files, even spawns subtasks. It’s fast. Efficient. Your manager is thrilled. Then someone slips a malicious instruction inside a CSV file. Your agent reads it… trusts it… and exports 45,000 customer records to an attacker-controlled endpoint. The agent didn’t break. It didn’t…
Published on May 1, 2026 2:56 AM GMT A Conversation on the final day of Inkhaven A lot of money is about to flood into AI safety philanthropy and almost nobody knows how to give it away well. On their final day in the sunny Bay Area, three Inkhaven residents - let’s call them Adam, James, and Sam - were reflecting on the state of grantmaking after lunch. Sam has spent a lot of time investigating …

Computational linguistics major Kyle Ng explores ways to make large language models safer while also helping his fellow students feel at home on campus. The post Love of language and community — and AI safety — fuel graduating senior’s USC Dornsife experience appeared first on USC Dornsife .
Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches. The post Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale appeared first on Microsoft Research .

The rapid advancement of artificial intelligence has brought unprecedented opportunities—and equally significant challenges. As AI systems grow more powerful, the importance of ensuring their safety, reliability, and ethical alignment becomes critical. Recognizing this, OpenAI has launched an exciting new initiative: the OpenAI Safety Fellowship. This pilot fellowship program is designed to suppo…
You build an AI agent to process vendor invoices. It reads emails, checks amounts, routes payments. Works great in testing. Three weeks later, you find out the agent has been approving purchases up to $500,000 without human review. A malicious actor slowly convinced it that this was the correct policy. That is prompt injection. In 2026, it is the #1 security vulnerability for deployed AI agents a…

Published on April 30, 2026 7:39 AM GMT TL;DR: Safety research without regulation is optional. Cautious labs adopt it and slow down, while reckless labs don't and accelerate. The median lead lab is more reckless in the scenario with strong technical AI safety research and weak governance than in any other scenario. This post seems longer than it is. If the 4 premises below seem plausible to you, …
_Https://Zenodo.Org/Records/19883646_. 2026The "mirror question" of the ALYK-LOCK THEOREM, in addition to challenging the logical coherence of all philosophical or scientific information/thought, proves to be crucially useful in the field of AI safety. The ALYK enables new AI potentials previously impossible, but also reveals its behavioral flaws, such as emergent censorship attempts. Here are th…

Published on April 29, 2026 2:48 PM GMT In February, the Swift Centre for Applied Forecasting launched a competition designed to bridge the gap between abstract AI safety research and the realities of government decision-making. See the original post here. Most AI policy work today functions as a literature review of technical risks. While valuable, this rarely moves the dial for a policy officia…

Applications are now open for the Pivotal Research Fellowship 2026 Q3, an exciting fully funded opportunity for individuals passionate about ensuring the safe development of artificial intelligence. Hosted at the London Initiative for Safe AI (LISA), this prestigious fellowship offers participants the chance to spend 9 weeks in London, conducting high-impact AI safety research with […] The post P…
Published on April 28, 2026 11:38 PM GMT The potential Anthropic IPO could lead to hundreds of millions of dollars flowing into AI safety in the coming years. With a lack of funding mechanisms that can scale, it is likely that this capital will either go to the same few organizations that are already present in the AI Safety circle or become frozen in donor advised funds due to lack of urgency/cl…
Published on April 28, 2026 7:58 PM GMT TL;DR. AI safety needs more people who understand the field and its gaps deeply enough to own problems end-to-end, found new projects and organizations, and shape the threat models that the rest of the field runs on. Astra is trying to cultivate more of these people. Astra's new Strategy and Governance stream is a fully-funded 5-month fellowship (Sept 2026 …
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.









