Prompt is All You Need: Prompting Foundation Models for Large-scale Self-supervised Semantic Segmentation

This paper addresses the important and challenging task of large-scale unsupervised semantic segmentation (LUSS). We present the first attempt to unleash the power of foundation models (FMs) for the challenging, dense prediction task LUSS, and our main objective is to present simple, effective yet efficient solutions for LUSS, namely Prompting foundation models for LUSS (PLUSS). Firstly, we proposed a cascade framework PLUSS$\_\alpha$ by effectively marrying CLIPS, Grounding DINO, and SAM in a z