Exploration vs. exploitation

elianadu
Published on May 24, 2026 3:40 PM GMT For context, there are two ways models learn in reinforcement learning: exploration vs. exploitation. 1 Every action a model takes has probability p of being random (exploration), and probability 1 - p of being the best possible action among known actions (exploitation). When the model has not been trained at all, when it knows nothing, that first action is pure exploration. As the model learns more, the probability of exploration decreases, while the probab