Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages
Srija Anand·Mitesh M. Khapra·Ishvinder Sethi·Aaditya Pareek·Kartik Rajput·Gaurav Yadav·Nikhil Narasimhan·Adish Pandya·Deepon Halder·Mohammed Safi Ur Rahman Khan·Praveen S·Shobhit Banga·Ashwin Sankar
Crowdsourced pairwise evaluation has emerged as a scalable approach for assessing foundation models. However, applying it to Text to Speech(TTS) introduces high variance due to linguistic diversity and multidimensional nature of speech perception. We present a controlled multidimensional pairwise evaluation framework for multilingual TTS that combines linguistic control with perceptually grounded annotation. Using 5K+ native and code-mixed sentences across 10 Indic languages, we evaluate 7 state
