Enhancing Stability of Probabilistic Model-Based Reinforcement Learning by Adaptive Noise Filtering

This article proposes stabilized model-based policy optimization (SMBPO) to address the stability and efficiency issues in current probabilistic model-based reinforcement learning (MBRL) approaches. It adaptively filters the noises caused by imperfect models in both model and policy updates: 1) dimensions with abnormal distributions in the prediction are refined to stabilize the training of probabilistic models and 2) predicted states and estimated value functions are clipped to mitigate the neg