Policy-Adjustable Q-Learning for Data-Driven Nonlinear Optimal Tracking Control

This article investigates a novel policy-adjustable Q-learning (PA-QL) algorithm aimed at addressing the optimal tracking control (OTC) problem for nonlinear discrete-time (DT) systems with enhanced adaptability and flexibility. A novel iteration scheme is developed that integrates the control weights into the augmented neural network (NN) input, thereby reformulating the learning process to explicitly characterize the optimal policy as a function of the adjustable weights. Consequently, the lea