If you call an open-weight model behind an API, whether that is your own box, a hosted endpoint, or a router, you are trusting that the thing answering is the model on the label. Providers have every incentive to serve a smaller or more aggressively quantized model under load. I wanted to know if you can catch that from the outside. Short version: the obvious method fails, a less obvious one works, and it only works if you accumulate evidence. Attempt 1: grade the output. Dead on arrival. The in