Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model
Jyoti Aneja, Michael Harrison, Neel Joshi, Tyler LaBonte, John Langford, Eduardo Salinas
At a glance
- Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs. It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user-interfaces.
- We share lessons learned and best practices for training a multimodal reasoning model—showing the benefit of careful architecture choices,...
