Causality-Based Modality- and Platform-Invariant Representation Learning for Dynamic RGBT Tracking and a Benchmark

Each sequence in existing RGBT tracking datasets is typically captured from a single platform equipped with both RGB (visible light) and TIR (thermal infrared) sensors. In real-world applications, tracking some objects requires cross-platform collaboration and these platforms might be equipped with different sensors. However, changes in modalities and platforms may cause significant variations in target appearance and abrupt position shifts, which existing RGBT trackers struggle to handle. To ad