IntroductionFor motion robots that use dynamic perception, state-of-the-art systems still struggle to simultaneously tackle various challenges, including high-speed motion blur, strong interactive occlusion, and drastic changes in scene lighting, which limit the robustness and real-time performance of tracking.MethodsThis study proposes a multimodal perception optimization model that integrates multiple algorithms. The model first uses YOLOv5 to achieve rapid detection and localization of multip