A tolerance analysis framework for microservice-based systems against cascading failures

Abstract Microservice has become a dominant approach for building large-scale Internet applications. The microservice-based system (MS) consists of thousands of services, and its complex interactions make it highly susceptible to unforeseen cascading failures. Cascading failure models are commonly used to analyze the system’s tolerance, while the existing models overlook MS’s features and fail to incorporate real-world events, leading to bias in simulation results. To address these, we proposed