Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning

This paper introduces novel Bellman mappings (BMaps) for value iteration (VI) in distributed reinforcement learning (DRL), where agents are deployed over an undirected, connected graph/network with arbitrary topology—but without a centralized node, that is, a node capable of aggregating all data and performing computations. Each agent constructs a nonparametric B-Map from its private data, operating on Qfunctions represented in a reproducing kernel Hilbert space, with flexibility in choosing the