Last month, a 340ms spike in our TTS pipeline caused 12% of Loquent callers to talk over the AI mid-response. We didn't catch it for six hours because we were measuring the wrong thing — average latency instead of tail latency at each pipeline stage. That incident is why we built vox-bench , and why we're releasing it today. Why we needed this When you're building a voice AI agent that handles thousands of live phone calls per month — dental appointment bookings, patient intake, after-hours tria