NVIDIA Autonomous Vehicle Research Group

SafeVL: Driving Safety Evaluation via Meticulous Reasoning in Vision Language Models

12/31/2025

Safety remains a fundamental challenge in autonomous driving, with a key step being the development of a safety evaluator that can reliably identify unsafe (i.e., collision-prone) scenarios. Existing methods, however, either rely heavily on object trajectories or use only language-based reasoning, neglecting crucial visual cues and limiting their generalization to unsafe events. Vision–Language M…

aicomputer-visionmachine-learning

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

12/30/2025

End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. To address this, we introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajec…

aiautonomous-systemsmachine-learning

LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

12/23/2025

Simulators can generate virtually unlimited driving data, yet imitation learning policies in simulation still struggle to achieve robust closed-loop performance. Motivated by this gap, we empirically study how misalignment between privileged expert demonstrations and sensor-based student observations can limit the effectiveness of imitation learning. More precisely, experts have significantly hig…

aimachine-learningreinforcement-learning

Towards Efficient and Effective Multi-Camera Encoding for End-to-End Driving

12/11/2025

We present Flex, an efficient and effective scene encoder that addresses the computational bottleneck of processing high-volume multi-camera data in end-to-end autonomous driving. Flex employs a small set of learnable scene tokens to jointly encode information from all image tokens across different cameras and timesteps. By design, our approach is geometry-agnostic, learning a compact scene repre…

aicomputer-vision

Latent Chain-of-Thought World Modeling for End-to-End Driving

12/11/2025

Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior work uses natural language to express chain-of-thought (CoT) reasoning before producing driving actions. However, text may not be the most efficient representation for reasoning. In this work, we present Latent-C…

aimachine-learningrobotics

Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions

12/9/2025

Synthesizing realistic Martian landscape videos is crucial for mission rehearsal and robotic simulation. However, this task poses unique challenges due to the scarcity of high-quality Martian data and the significant domain gap between Martian and terrestrial imagery. To address these challenges, we propose a holistic solution composed of two key components: 1) A data curation pipeline Multimodal…

3d-printingtechnology

Optimization-Guided Diffusion for Interactive Scene Generation

12/8/2025

Realistic and diverse multi-agent driving scenes are crucial for evaluating autonomous vehicles, but safety-critical events which are essential for this task are rare and underrepresented in driving datasets. Data-driven scene generation offers a low-cost alternative by synthesizing complex traffic behaviors from existing driving logs. However, existing models often lack controllability or yield …

aimachine-learning

Beyond Behavior Cloning in Autonomous Driving: a Survey of Closed-Loop Training Techniques

12/5/2025

Behavior cloning, the dominant approach for training autonomous vehicle (AV) policies, suffers from a fundamental gap: policies trained open-loop on temporally independent samples must operate in closed-loop where actions influence future observations. This mismatch can cause covariate shift, compounding errors, and poor interactive behavior, among other issues. Closed-loop training mitigates the…

aiautonomous-systemsmachine-learning

ReSim: Reliable World Simulation for Autonomous Driving

12/2/2025

How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work…

aiautonomous-systems

RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

12/1/2025

Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. We introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy’s own closed-loop rollouts as additional training …

aireinforcement-learning

Model-Based Policy Adaptation for Closed-Loop End-to-end Autonomous Driving

12/1/2025

End-to-end (E2E) autonomous driving models have demonstrated strong performance in open-loop evaluations but often suffer from cascading errors and poor generalization in closed-loop settings. To address this gap, we propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment. MPA first generates diver…

aiautonomous-systemsmachine-learning

Extrapolated Urban View Synthesis Benchmark

10/19/2025

Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-t…

aicomputer-vision

Wolf: Dense Video Captioning with a World Summarization Framework

10/7/2025

We propose Wolf, a WOrLd summarization Framework for accurate video captioning. Wolf is an automated captioning framework that adopts a mixture-of-experts approach, leveraging complementary strengths of Vision Language Models (VLMs). By utilizing both image and video models, our framework captures different levels of information and summarizes them efficiently. Our approach can be applied to enha…

aicomputer-visionnlp

CaRL: Learning Scalable Planning Policies with Simple Rewards

8/20/2025

We investigate reinforcement learning (RL) for privileged planning in autonomous driving. State-of-the-art approaches for this task are rule-based, but these methods do not scale to the long tail. RL, on the other hand, is scalable and does not suffer from compounding errors like imitation learning. Contemporary RL approaches for driving use complex shaped rewards that sum multiple individual rew…

aireinforcement-learning

Can Test-Time Scaling Improve World Foundation Model?

8/8/2025

World foundation models, which simulate the physical world by predicting future states from current observations and inputs, have become central to many applications in physical intelligence, including autonomous driving and robotics. However, these models require substantial computational resources for pretraining and are further constrained by available data during post-training. As such, scali…

aimachine-learning

Safety Evaluation of Motion Plans Using Trajectory Predictors as Forward Reachable Set Estimators

7/30/2025

The advent of end-to-end autonomy stacks - often lacking interpretable intermediate modules - has placed an increased burden on ensuring that the final output, i.e., the motion plan, is safe in order to validate the safety of the entire stack. This requires a safety monitor that is both complete (able to detect all unsafe plans) and sound (does not flag safe plans). In this work, we propose a pri…

airobotics

Sim2Val: Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation

6/25/2025

Learning-based robotic systems demand rigorous validation to assure reliable performance, but extensive real-world testing is often prohibitively expensive, and if conducted may still yield insufficient data for high-confidence guarantees. In this work we introduce Sim2Val, a general estimation framework that leverages paired data across test platforms, e.g., paired simulation and real-world obse…

aimachine-learning

RAMEN: Real-time Asynchronous Multi-agent Neural Implicit Mapping

6/21/2025

Multi-agent neural implicit mapping allows robots to collaboratively capture and reconstruct complex environments with high fidelity. However, existing approaches often rely on synchronous communication, which is impractical in real-world scenarios with limited bandwidth and potential communication interruptions. This paper introduces RAMEN: Real-time Asynchronous Multi-agEnt Neural implicit mapp…

aimachine-learningrobotics

Diagnostic Runtime Monitoring with Martingales

6/21/2025

Machine learning systems deployed in safety-critical robotics settings must be robust to distribution shifts. However, system designers must understand the cause of a distribution shift in order to implement the appropriate intervention or mitigation strategy and prevent system failure. In this paper, we present a novel framework for diagnosing distribution shifts in a streaming fashion by deploy…

aimachine-learning

Efficient Multi-Camera Tokenization with Triplanes for End-to-End Driving

6/13/2025

Autoregressive Transformers are increasingly being deployed as end-to-end robot and autonomous vehicle (AV) policy architectures, owing to their scalability and potential to leverage internet-scale pretraining for generalization. Accordingly, tokenizing sensor data efficiently is paramount to ensuring the real-time feasibility of such architectures on embedded hardware. To this end, we present an…

aiengineeringmachine-learningrobotics

research.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?