Self-supervised learning (SSL) can capture intrinsic features from extensive unlabeled data, significantly reducing dependence on labels and performing well in human activity recognition (HAR). However, existing SSL frameworks depend excessively on data augmentation paradigms, and often mistakenly treat noise as learning objectives during mask reconstruction. Moreover, the data set scale often constrains accuracy and hinders real-world applicability. To address these issues, this paper proposes
