Understanding Reinforcement Learning with Neural Networks Part 5: Connecting Reward, Derivative, and Step Size

In the previous article , we explored the reward system in reinforcement learning In this article, we will begin calculating the step size . First Update In this example, the learning rate is 1.0 . So, the step size is 0.5 . Next, we update the bias by subtracting the step size from the old bias value 0.0 : After the Update Now that the bias has been updated, we run the model again. The new probability of going to Place B becomes 0.4 . This means the probability of going to Place A is: Choosing