Precise positioning and identification of unauthorized unmanned aerial vehicles (UAVs) are of crucial importance for spectrum security and privacy protection in future intelligent networks. Although various single-modality approaches have been investigated, their performance degrades under the sensor-specific noise, resulting in suboptimal performance and robustness. To address these security challenges, we propose a multi-layer radio frequency (RF)-vision fusion framework that synergistically e