Benchmarking LLM Structured Outputs

Cross-posted from carrick.tools . When you read the API documentation for OpenAI, Anthropic, or Google Gemini, the feature called "structured outputs" looks like a solved problem: pass a JSON schema, get back JSON that conforms to it. In production, it is not a contract. It is a well-typed, best-effort suggestion. At Carrick , the code-analysis scanner I work on, our post-LLM pipeline relies on a four-stage fallback parser. We attempt a direct parse, strip markdown fences, scan for array bounds