This paper introduces novel Bellman mappings (BMaps) for value iteration (VI) in distributed reinforcement learning (DRL), where agents are deployed over an undirected, connected graph/network with arbitrary topology—but without a centralized node, that is, a node capable of aggregating all data and performing computations. Each agent constructs a nonparametric B-Map from its private data, operating on Qfunctions represented in a reproducing kernel Hilbert space, with flexibility in choosing the

