reinforcement-learning

Towards Data Science

Part 2. Building scale-invariant agents that seamlessly change contexts The post Surviving High Uncertainty in Logistics with MARL appeared first on Towards Data Science .

aireinforcement-learning
Frontiers in Artificial Intelligence | New and Recent Articles

In the busy and stressful modern world, people tend to disregard mental health, still it is an important factor of overall health. The constant pressure to achieve success, the invasive nature of technology, and the constantly growing needs of the contemporary world may all be the causes of stress, anxiety, and other mental health difficulties. Despite growing awareness, mental health remains a s…

aimachine-learningmedicinereinforcement-learning
Scientific Reports
Nature Communications

Nature Communications, Published online: 04 May 2026; doi:10.1038/s41467-026-72413-9 Rocket introduces a self-play RL framework for automated hyperparameter optimization, handling mixed types without priors. It scales large datasets via reward approximation, achieving expert-level performance while cutting time and cost in real-world deployments.

aideep-learningreinforcement-learning
Towards Data Science

A practical guide to understanding AI agent design, ReAct workflows, and when to scale from a single agent to a multi-agent system. The post Single Agent vs Multi-Agent: When to Build a Multi-Agent System appeared first on Towards Data Science .

aimachine-learningreinforcement-learning
DEV Community

You have a map of the frozen lake. Every crack in the ice, every slippery patch, every hole is marked. You can sit at your desk and plan the perfect route before stepping foot on the ice. That is value iteration. Now imagine you have no map. You lace up your boots and start walking. You slip, you fall into holes, you backtrack. But each time you learn a little more about which moves pay off and w…

aireinforcement-learning
Frontiers in Artificial Intelligence | New and Recent Articles

IntroductionWe address moral uncertainty in reinforcement learning (RL) by proposing a framework that integrates multiple ethical theories into decision-making. Existing approaches rely on single moral frameworks or handcrafted rewards, limiting scalability and failing to capture moral pluralism. We introduce AMULED, a task-agnostic ethical layer that refines a pre-trained RL agent using large la…

aiai-ethicsethicsreinforcement-learning
NASA Science

Reinforcement Learning Fundamentals, Speaker: Carol Cuesta-Lazaro, IAS/Flatiron The post AI/ML STIG Lecture Series, 11 May 2026 appeared first on NASA Science .

aireinforcement-learning
NASA Science

Reinforcement Learning Applications. Speaker: Carol Cuesta-Lazaro, IAS/Flatiron The post AI/ML STIG Lecture Series, 18 May 2026 appeared first on NASA Science .

aireinforcement-learning
Apple Machine Learning Research

Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents using outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use decisions) lead to success or failure. In this paper, we propose PORTool, an importanc…

aimachine-learningreinforcement-learning
Apple Machine Learning Research

This paper was accepted at the Fifth Workshop on Natural Language Generation, Evaluation, and Metrics at ACL 2026. Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or…

aimachine-learningnlpreinforcement-learning
KDnuggets
DEV Community

While working on agentic systems my co-founder early on detected the core limitation of agentic systems. They are doing this sequentially, so AI agent need to wait other agent to start the work. And they are blocking in nature - if you are building agentic system you probably know that almost any SDK or library has agents with blocking run method (await). So we decided to start addressing this pr…

aimachine-learningreinforcement-learning
DEV Community

A practical, end-to-end walkthrough of Nous Research's Hermes Agent : the principles it's built on, the architecture that makes it work, and a concrete checklist for building a similar self-improving agent yourself. 📋 Table of Contents 🤖 1. What Hermes Actually Is (in one paragraph) 🧭 2. Core Principles 2.1 🌐 Platform-agnostic core 2.2 🔒 Prompt stability (cache-friendly) 2.3 🔍 Progressive disclos…

aimachine-learningreinforcement-learning
EdTech Innovation Hub

The UCL professor and former Google DeepMind research lead has founded Ineffable Intelligence, one of Europe's largest-ever seed rounds, with a mission to develop reinforcement learning systems capable of surpassing human knowledge. AlphaGo creator David Silver has raised $1.1 billion for Ineffable Intelligence, a new AI venture focused on reinforcement learning. Professor David Silver, the compu…

aireinforcement-learning
Research

At Penn State Great Valley’s annual student research poster competition, students presented a variety of projects to faculty and guests. The three winning posters focused on a job posting data pipeline for management research, a multi-agent system to automate data science tasks for non-data scientists and reinforcement learning for optimized urban energy harvesting.

aicomputer-sciencedata-analyticsengineeringreinforcement-learning
PhilPapers: Recent additions to PhilArchive

This concept paper proposes that a major untapped training signal for advanced AI lies in the systematic divergence between human predictions and actual outcomes. Public discourse continuously generates forecasts about politics, economics, conflict, technology, institutions, and social change, yet these forecasts are rarely extracted, formalized, scored, and analyzed as a learning resource. The p…

aimachine-learningreinforcement-learning
DEV Community

A Sunday-morning postmortem on teaching a 3B model to do enterprise IT triage with GRPO. It's 1 AM on a Sunday. The Meta × PyTorch OpenEnv Hackathon submission is due at 5 PM. My training logs show a loss curve that's been flat at 0.0 for the last thirty minutes. A flat loss in supervised learning means convergence. A flat loss in reinforcement learning usually means something else: your model ha…

aireinforcement-learning
Hacker News

DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles We are thrilled to announce Day-0 support for DeepSeek-V4 across both inference and RL training. SGLang and Miles form the first open-source stack to serve and train DeepSeek-V4 on launch day — with systems purpose-built for its hybrid sparse-attention architecture, manifold-constrained hyper-connections (mHC), and FP4…

aireinforcement-learning
NASA Science

Reinforcement Learning Fundamentals, Speaker: Carol Cuesta-Lazaro, IAS/Flatiron The post AI/ML STIG Lecture Series, 4 May 2026 appeared first on NASA Science .

aireinforcement-learning
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?