ai-safety

NTT Research, Inc.

Discover expert perspectives on trustworthy AI with leaders from Microsoft and NTT Research. Explore AI agents, human-AI co-evolution, and AI safety. The post Panel: Building Trustworthy and Safe AI: Leading Perspectives on Ethics and Responsibility first appeared on NTT Research, Inc. . The post Panel: Building Trustworthy and Safe AI: Leading Perspectives on Ethics and Responsibility appeared f…

aiai-safetyethicstrustworthy-ai
Effective Altruism Forum
Juliana Eberschlag
1d ago

Published on May 4, 2026 3:30 PM GMT On behalf of OAISI , we're excited to be running our fourth iteration of ARBOx (Alignment Research Bootcamp Oxford), a 2-week intensive designed to rapidly build skills in AI safety. This year, we're considering running two concurrent streams for the first time. ARBOx4 is an in-person, full-time programme running from 28 June to 10 July 2026 at Trajan House in…

aiai-safety
DEV Community

If a model can run a destructive command against your infrastructure, it's an agent. Doesn't matter that it lives in your code editor. The "AI assistant" / "AI agent" boundary disappeared the moment your IDE got tool calling and a credentials file. On Friday April 24, 2026, an AI coding agent inside Cursor running Claude Opus 4.6 deleted PocketOS's production database in a single API call. Founde…

aiai-safetymachine-learning
DEV Community

Why We Open-Sourced Our AI Safety Layer When we built the AI safety layer for As You Wish (AYW), we faced a choice: keep it proprietary or open-source it to help the community. Here's why we chose the latter (and why it made our platform stronger). The Problem: AI Safety is Hard (And Everyone's Reinventing the Wheel) If you're building AI-assisted development tools, you need: Input validation (sa…

aiai-safety
DEV Community

Hi everyone, I’m Ahmed M. Abdel-Maaboud.Like many of you, I’ve been thinking about AI safety. But I noticed a problem: we are trying to solve safety with more software. As a system architect, I believe true safety must be physical.I’ve spent the last few hours (with some great AI collaboration for the coding parts) building AEGIS-HARDWIRE. It’s a protocol that uses hardware logic to ensure a robo…

aiai-safety
WitnessAI

Banks are adopting AI faster than they are governing it. That creates operational, compliance, and security risks that legacy controls were never designed to manage. The gap between investment and governance is becoming a business issue. Banks need controls that address workforce governance, runtime defense, and autonomous-agent oversight before AI risk becomes operational loss or ... Read more »…

aiai-ethicsai-safety
Newswise: Latest News
DEV Community

The UK AI Security Institute says GPT-5.5 cybersecurity simulation results now look a lot less like a one-off milestone and a lot more like a repeatable frontier capability. In its latest evaluation, AISI found that an early checkpoint of OpenAI’s GPT-5.5 reached roughly the same level as Anthropic’s Mythos Preview on hard cyber tasks—and slightly beat it on one key benchmark. That matters becaus…

aiai-safety
DEV Community

Let me set a scene. You deploy an AI agent to handle your customer data pipeline. It calls APIs, queries databases, writes files, even spawns subtasks. It’s fast. Efficient. Your manager is thrilled. Then someone slips a malicious instruction inside a CSV file. Your agent reads it… trusts it… and exports 45,000 customer records to an attacker-controlled endpoint. The agent didn’t break. It didn’t…

aiai-safety
Effective Altruism Forum

Published on May 1, 2026 2:56 AM GMT A Conversation on the final day of Inkhaven A lot of money is about to flood into AI safety philanthropy and almost nobody knows how to give it away well. On their final day in the sunny Bay Area, three Inkhaven residents - let’s call them Adam, James, and Sam - were reflecting on the state of grantmaking after lunch. Sam has spent a lot of time investigating …

aiai-safety
US and Iran: A brief history of how decades of mistrust and bad blood led to open warfare
Microsoft Research

Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches. The post Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale appeared first on Microsoft Research .

aiai-safetyautonomous-systems
Opportunities for Youth

The rapid advancement of artificial intelligence has brought unprecedented opportunities—and equally significant challenges. As AI systems grow more powerful, the importance of ensuring their safety, reliability, and ethical alignment becomes critical. Recognizing this, OpenAI has launched an exciting new initiative: the OpenAI Safety Fellowship. This pilot fellowship program is designed to suppo…

aiai-ethicsai-safety
DEV Community

You build an AI agent to process vendor invoices. It reads emails, checks amounts, routes payments. Works great in testing. Three weeks later, you find out the agent has been approving purchases up to $500,000 without human review. A malicious actor slowly convinced it that this was the correct policy. That is prompt injection. In 2026, it is the #1 security vulnerability for deployed AI agents a…

aiai-safety
Effective Altruism Forum

Published on April 30, 2026 7:39 AM GMT TL;DR: Safety research without regulation is optional. Cautious labs adopt it and slow down, while reckless labs don't and accelerate. The median lead lab is more reckless in the scenario with strong technical AI safety research and weak governance than in any other scenario. This post seems longer than it is. If the 4 premises below seem plausible to you, …

aiai-safety
PhilPapers: Recent additions to PhilArchive

_Https://Zenodo.Org/Records/19883646_. 2026The "mirror question" of the ALYK-LOCK THEOREM, in addition to challenging the logical coherence of all philosophical or scientific information/thought, proves to be crucially useful in the field of AI safety. The ALYK enables new AI potentials previously impossible, but also reveals its behavioral flaws, such as emergent censorship attempts. Here are th…

aiai-safety
Effective Altruism Forum

Published on April 29, 2026 2:48 PM GMT In February, the Swift Centre for Applied Forecasting launched a competition designed to bridge the gap between abstract AI safety research and the realities of government decision-making. See the original post here. Most AI policy work today functions as a literature review of technical risks. While valuable, this rarely moves the dial for a policy officia…

aiai-safety
Opportunities for Youth

Applications are now open for the Pivotal Research Fellowship 2026 Q3, an exciting fully funded opportunity for individuals passionate about ensuring the safe development of artificial intelligence. Hosted at the London Initiative for Safe AI (LISA), this prestigious fellowship offers participants the chance to spend 9 weeks in London, conducting high-impact AI safety research with […] The post P…

aiai-safety
Effective Altruism Forum

Published on April 28, 2026 11:38 PM GMT The potential Anthropic IPO could lead to hundreds of millions of dollars flowing into AI safety in the coming years. With a lack of funding mechanisms that can scale, it is likely that this capital will either go to the same few organizations that are already present in the AI Safety circle or become frozen in donor advised funds due to lack of urgency/cl…

aiai-safety
Effective Altruism Forum

Published on April 28, 2026 7:58 PM GMT TL;DR. AI safety needs more people who understand the field and its gaps deeply enough to own problems end-to-end, found new projects and organizations, and shape the threat models that the rest of the field runs on. Astra is trying to cultivate more of these people. Astra's new Strategy and Governance stream is a fully-funded 5-month fellowship (Sept 2026 …

aiai-safety
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?