Scaling laws of emergent multilingualism in vision-language models
Caroline Winter
New research identifies scaling laws that explain how vision-language models develop emergent multilingual image captioning abilities through generalization from translation data, even without explicit multimodal supervision
