Multimodal foundation models exploit text to make medical image predictions

Nature Communications, Published online: 12 June 2026; doi:10.1038/s41467-026-74207-5 The study shows that multimodal medical AI models often rely heavily on text to interpret images and make predictions. As a result, they are highly susceptible to misinterpretation when the text is inaccurate.