The State of Reinforcement Learning for LLM Reasoning

The State of Reinforcement Learning for LLM Reasoning Understanding GRPO and New Insights from Reasoning Model Papers A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to these releases were relatively muted. Why? One reason could be that GPT-4.5 and Llama 4 remain conventional models, which means they were trained without explicit reinforcement learning for reasoning. Meanwhile, competitors...