Accelerated Reinforcement Learning With Verifiable Excitation for Cubic Convergence

This article proposes and analyzes an accelerated reinforcement learning (RL) algorithm for discrete-time linear systems with unknown dynamics. The method achieves cubic convergence, improving upon the quadratic rates of existing policy iteration (PI)-based RL algorithms, and it does not rely on persistency of excitation (PE). The value function matrix is computed through a midpoint-centered Lyapunov equation, which provides a third-order Newton-type update and ensures fast convergence. To enabl