Microsoft Research

Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches. The post Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale appeared first on Microsoft Research .

aiai-safetyautonomous-systems

Deploying large language models (LLMs) in real-world, high-stakes settings is harder than it should be. In high-stakes settings like law, medicine, and cloud incident response, performance and reliability can quickly break down because adapting models to domain-specific requirements is a slow and manual process that is difficult to reproduce. The core challenge is domain adaptation, […] The post …

aimachine-learningnlp
Doug Burger·...·Ishai Menache
15d ago

Doug Burger, sustainability expert Amy Luers, and optimization researcher Ishai Menache examine the global emissions implications of datacenter operations, efficiency gains, and AI's potential across electrification, materials, and food systems. The post Can we AI our way to a more sustainable world? appeared first on Microsoft Research .

aienvironmentmachine-learningrenewable-energysustainability

For the past five years, the New Future of Work report has captured how work is changing. This year, the shift feels especially sharp. Previous editions have focused on technology’s role in increasing productivity by automating tasks, accelerating communication, and expanding access to information, as well as the rise of remote work. Today, generative AI […] The post New Future of Work: AI is dri…

aigenerative-aimachine-learning
Jaime Teevan·...·Rebecca Janssen
26d ago

Microsoft Chief Scientist Jaime Teevan and researchers Jenna Butler, Jake Hofman, and Rebecca Janssen unpack the New Future of Work Report 2025 and explore the ideal AI-driven working world. Plus, is AI a tool or a collaborator? And why the answer matters. The post Ideas: Steering AI toward the work future we want appeared first on Microsoft Research .

aimachine-learningnlp

At a glance - AI benchmarks report performance on specific tasks but provide limited insight into underlying capabilities; ADeLe evaluates models by scoring both tasks and models across 18 core abilities, enabling direct comparison between task demands and model capabilities. - Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such a…

aimachine-learning

At a glance - To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback. - AsgardBench isolates whether agents can use visual observations to revise their plans as tasks unfold. - Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe. - Because objects can be i…

aimachine-learning

At a glance - VLM-based robot planners struggle with long, complex tasks because natural-language plans can be ambiguous, especially when specifying both actions and locations. - GroundedPlanBench evaluates whether models can plan actions and determine where they should occur across diverse, real-world robot scenarios. - Video-to-Spatially Grounded Planning (V2GP) is a framework that converts rob…

aimachine-learningrobotics
Doug Burger·...·Nicolo Fusi
3/23/2026

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward. In The Shape of Things to Come, Microsoft Research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today. The goal: to amplify the shared understandin…

aimachine-learning

At a glance - Problem: Debugging AI agent failures is hard because trajectories are long, stochastic, and often multi-agent, so the true root cause gets buried. - Solution: AgentRx (opens in new tab) pinpoints the first unrecoverable (“critical failure”) step by synthesizing guarded, executable constraints from tool schemas and domain policies, then logging evidence-backed violations step-by-step…

aimachine-learning

At a glance - Today’s AI agents store long interaction histories but struggle to reuse them effectively. - Raw memory retrieval can overwhelm agents with lengthy, low-value context. - PlugMem transforms interaction history into structured, reusable knowledge. - A single, general-purpose memory module improves performance across diverse agent benchmarks while using fewer memory tokens. It seems co…

aimachine-learning

At a glance - Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs. It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user-interfaces. - We share lessons learned and best practice…

aimachine-learning

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward. In The Shape of Things to Come, Microsoft research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today. The goal: to amplify the shared understandin…

aimachine-learning
Abubakarr Jaye·+6 more
2/26/2026

At a glance - Today’s AI agent benchmarks test one task at a time, while real workplace productivity requires managing dozens of interdependent tasks at once. To reflect this, we created a setting called Multi-Horizon Task Environments (MHTEs). - Under multi-task loads, leading computer-using agents degrade sharply, with completion rates dropping from 16.7% to 8.7%. - CORPGEN introduces digital e…

aimachine-learning

Insights from Microsoft’s Media Integrity and Authentication: Status, Directions, and Futures report It has become increasingly difficult to distinguish fact from fiction when viewing online images and videos. Resilient, trustworthy technologies can help people determine whether the content they are viewing was captured by a camera or microphone—or generated or modified by AI tools. We refer to t…

Project Silica introduces new techniques for encoding data in borosilicate glass, as described in the journal Nature. These advances lower media cost and simplify writing and reading systems while supporting 10,000-year data preservation. The post Project Silica’s advances in glass storage technology appeared first on Microsoft Research .

materialsnanomaterialstechnology

This research looks at why Predictive Inverse Dynamics Models often outperform standard Behavior Cloning in imitation learning. By using simple predictions of what happens next, PIDMs reduce ambiguity and learn from far fewer demonstrations. The post Rethinking imitation learning with Predictive Inverse Dynamics Models appeared first on Microsoft Research .

aimachine-learning

Microsoft Research unveils Paza, a human-centered speech pipeline, and PazaBench, the first leaderboard for low-resource languages. It covers 39 African languages and 52 models and is tested with communities in real settings. The post Paza: Introducing automatic speech recognition benchmarks and models for low resource languages appeared first on Microsoft Research .

aispeech-recognition

AI can help generate medical image reports, but today’s models struggle with varying reporting schemes. Learn how UniRG uses reinforcement learning to boost performance of medical vision-language models. The post UniRG: Scaling medical imaging report generation with multimodal reinforcement learning appeared first on Microsoft Research .

aireinforcement-learning

Argos improves multimodal RL by evaluating whether an agent’s reasoning aligns with what it observes over time. The approach reduces visual hallucinations and produces more reliable, data-efficient agents for real-world applications. The post Argos: Multimodal reinforcement learning with agentic verifier for AI agents appeared first on Microsoft Research .

aireinforcement-learning
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?