SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems

This paper presents SysMoBench, a benchmark designed to evaluate generative AI's ability to formally model complex concurrent and distributed systems. Although the paper is published on January 2026, the AI landscape moves so fast that the models evaluated (like Claude-Sonnet-4 and GPT-5) already feel dated, after the release of heavy hitters like Claude 3.5 Opus or OpenAI's Codex. The paper draws a distinction between algorithms/protocols and system modeling. As the paper (somewhat circularly)